Stardict is an awesome dictionary utility for linux, with several dictionaries available for mandarin.
I started to learn cantonese recently, and I couldnt find any stardict dictionaries for cantonese.
There is a plugin for Firefox called Cantofish which uses the dictionary data from CantoDict, and adsotrans.
Unfortunately, for some reason CantoFish doesnt run on my machine.
And also, stardict’s mouse-over translation is really awesome.
So I took a look at converting Cantofish’s dictionary into stardict format, which turned out to be pretty easy.
Cantofish’s dictionary is in the firefox profile, in extensions/cantofish@cantofish.net/chrome/content/canto.dat
As far as I know, this data is available under the GPL, or possibly under a non-commercial attribution license (eg adso).
canto.dat is a tab separated text file, which is really easy to read. I was pleasantly surprised by this!
Then, stardict dictionaries can be created using tabfile, which is in stardict-tools, from an appropriate tab-separated file.
I used the following script to convert canto.dat into an input file for tabfile, and then the rest is easy:
#!/usr/bin/python
import sys
import os
import string
def go( cantopath, outpath ):
print cantopath
cantofd = open( cantopath, "r" )
outfd = open( outpath, "w")
firstline = True
for line in cantofd.readlines():
if not firstline:
line = line.strip()
#print line
cantocharacters = line.split(" ")[0]
#print cantocharacters
cantopronunciation = line.split("[")[1].split("]")[0].strip()
#print cantopronunciation
trans = string.join( line.split("]")[2:],' ').replace('/', '\n').strip().replace('\n', '\\n')
#print trans
outfd.write( cantocharacters + '\t' + cantopronunciation + '\\n' + trans + '\n' )
firstline = False
outfd.close()
cantofd.close()
go( sys.argv[1], sys.argv[2] )
The script expects the path of canto.dat as the first argument, and the name of the output file as the second.
Then you can just process the output using tabfile, and copy the resulting files into an appropriate subfolder of /usr/share/stardict/dic/dic.
I use a Mac. Any chance you can email me the stardict format of the Cantonese dictionary? Many thanks!!!
Beautiful and works like a charm! Thanks!
Well, thanks. the script worked fine but now I’ve got a file of this kind:
1 啲 di1\nplural prefix (Cantonese)
2 啱 ngaam1\ncorrect (Cantonese)
3 嗰 go2\nthat (Cantonese)
……
How to bring it into stardict? As far as I know stardict needs a folder with three files .idx, .oft and .ifo … how to create them?
> How to bring it into stardict? As far as I know stardict needs a folder with three files .idx, .oft and .ifo … how to create them?
Use tabfile:
- sudo apt-get install stardict-tools
- tabfile myfile.dat
More info on this page: http://stardict.sourceforge.net/HowToCreateDictionary
hughperkins – many thanks!
I had to make the tabfile an excecutable first – this is maybe too obvious for skilled users but nevertheless I finally managed to create the dictionary files and import it into stardict! I’ve learned a lot…thanks again for sharing the valuable code to make it happen!