A Viable Solution for Word Splitting Khmer?

后端 未结 3 1777
既然无缘
既然无缘 2021-01-01 19:52

I am working on a solution to split long lines of Khmer (the Cambodian language) into individual words (in UTF-8). Khmer does not use spaces between words. There are a few

3条回答
  •  旧时难觅i
    2021-01-01 20:45

    The ICU library (that has Python and Java bindings) has a DictionaryBasedBreakIterator class that can be used for this.

提交回复
热议问题