I want to split a sentence into a list of words.
For English and European languages this is easy, just use split()
>>> \"This is a sentence.
The list() is the answer for Chinese only sentence. For those hybrid English/Chines in most of case. It answered at hybrid-split, just copy answer from Winter as below.
def spliteKeyWord(str):
regex = r"[\u4e00-\ufaff]|[0-9]+|[a-zA-Z]+\'*[a-z]*"
matches = re.findall(regex, str, re.UNICODE)
return matches