I want to split a sentence into a list of words.
For English and European languages this is easy, just use split()
>>> \"This is a sentence.
Best tokenizer tool for Chinese is pynlpir.
import pynlpir
pynlpir.open()
mystring = "你汉语说的很好!"
tokenized_string = pynlpir.segment(mystring, pos_tagging=False)
>>> tokenized_string
['你', '汉语', '说', '的', '很', '好', '!']
Be aware of the fact that pynlpir has a notorious but easy fixable problem with licensing, on which you can find plenty of solutions on the internet. You simply need to replace the NLPIR.user file in your NLPIR folder downloading a valide licence from this repository and restart your environment.