I\'d like to make MySQL full text search work with Japanese and Chinese text, as well as any other language. The problem is that these languages and probably others do not norma
One year later, and you probably don't need this any more but the code on the following page might have some hints for what you want(ed) to do:
http://www.geocities.co.jp/SiliconValley-PaloAlto/7043/spamfilter/japanese-tokenizer.el.txt
If you made any progress after the above posts in your own search I am sure others would be interested to know.
(Edited to say there is a better answer here: How to classify Japanese characters as either kanji or kana?)