I want to split a sentence into a list of words.
For English and European languages this is easy, just use split()
>>> \"This is a sentence.
It's partially possible with Japanese, where you usually have different character classes at the beginning and end of the word, but there are whole scientific papers on the subject for Chinese. I have a regular expression for splitting words in Japanese if you are interested: http://hg.hatta-wiki.org/hatta-dev/file/cd21122e2c63/hatta/search.py#l19