How to do a Python split() on languages (like Chinese) that don't use whitespace as word separator?

后端 未结 9 2505
梦如初夏
梦如初夏 2020-12-03 03:25

I want to split a sentence into a list of words.

For English and European languages this is easy, just use split()

>>> \"This is a sentence.         


        
9条回答
  •  没有蜡笔的小新
    2020-12-03 03:59

    It's partially possible with Japanese, where you usually have different character classes at the beginning and end of the word, but there are whole scientific papers on the subject for Chinese. I have a regular expression for splitting words in Japanese if you are interested: http://hg.hatta-wiki.org/hatta-dev/file/cd21122e2c63/hatta/search.py#l19

提交回复
热议问题