I\'m using a python library called Guess Language: http://pypi.python.org/pypi/guess-language/0.1
\"justwords\" is a string with unicode text. I stick it in the package,
It looks like you should be able to pass your unicode as-is. guessLanguage
decodes an input that is str
as utf-8. So your .encode('utf-8')
is safe but unnecessary.
I skimmed the source code and assumed it relies exclusively on the data in its "trigrams" directory for language detection, and it would not handle Japanese because there is no "ja" subdirectory in there. That is not correct, as pointed out by John Machin. So I have to assume your input is not what you think it is (which is hard to debug since it's not showing up correctly in your question).