Converting CantoFish dictionary for stardict

Stardict is an awesome dictionary utility for linux, with several dictionaries available for mandarin.

I started to learn cantonese recently, and I couldnt find any stardict dictionaries for cantonese.

There is a plugin for Firefox called Cantofish which uses the dictionary data from CantoDict, and adsotrans.

Unfortunately, for some reason CantoFish doesnt run on my machine.

And also, stardict’s mouse-over translation is really awesome.

So I took a look at converting Cantofish’s dictionary into stardict format, which turned out to be pretty easy.

Cantofish’s dictionary is in the firefox profile, in extensions/cantofish@cantofish.net/chrome/content/canto.dat

As far as I know, this data is available under the GPL, or possibly under a non-commercial attribution license (eg adso).

canto.dat is a tab separated text file, which is really easy to read. I was pleasantly surprised by this!

Then, stardict dictionaries can be created using tabfile, which is in stardict-tools, from an appropriate tab-separated file.

I used the following script to convert canto.dat into an input file for tabfile, and then the rest is easy:

#!/usr/bin/python

import sys
import os
import string

def go( cantopath, outpath ):
	print cantopath
	cantofd = open( cantopath, "r" )
	outfd = open( outpath, "w")
	firstline = True
	for line in cantofd.readlines():
		if not firstline:
			line = line.strip()
			#print line
			cantocharacters = line.split(" ")[0]
			#print cantocharacters
			cantopronunciation = line.split("[")[1].split("]")[0].strip()
			#print cantopronunciation
			trans = string.join( line.split("]")[2:],' ').replace('/', '\n').strip().replace('\n', '\\n')
			#print trans
			outfd.write( cantocharacters + '\t' + cantopronunciation + '\\n' + trans + '\n' )
		firstline = False
	outfd.close()
	cantofd.close()

go( sys.argv[1], sys.argv[2] )

The script expects the path of canto.dat as the first argument, and the name of the output file as the second.

Then you can just process the output using tabfile, and copy the resulting files into an appropriate subfolder of /usr/share/stardict/dic/dic.

5 Responses to “Converting CantoFish dictionary for stardict”

  1. kenneth says:

    I use a Mac. Any chance you can email me the stardict format of the Cantonese dictionary? Many thanks!!!

  2. Andy says:

    Beautiful and works like a charm! Thanks!

  3. Fab says:

    Well, thanks. the script worked fine but now I’ve got a file of this kind:

    1 啲 di1\nplural prefix (Cantonese)
    2 啱 ngaam1\ncorrect (Cantonese)
    3 嗰 go2\nthat (Cantonese)
    ……

    How to bring it into stardict? As far as I know stardict needs a folder with three files .idx, .oft and .ifo … how to create them?

  4. hughperkins says:

    > How to bring it into stardict? As far as I know stardict needs a folder with three files .idx, .oft and .ifo … how to create them?

    Use tabfile:

    - sudo apt-get install stardict-tools
    - tabfile myfile.dat

    More info on this page: http://stardict.sourceforge.net/HowToCreateDictionary

  5. Fab says:

    hughperkins – many thanks!

    I had to make the tabfile an excecutable first – this is maybe too obvious for skilled users but nevertheless I finally managed to create the dictionary files and import it into stardict! I’ve learned a lot…thanks again for sharing the valuable code to make it happen!

Leave a Reply