I want to split a sentence into a list of words.
For English and European languages this is easy, just use split()
>>> \"This is a sentence.
Languages like Chinese have a very fluid definition of a word. E.g. One meaning of ma is "horse". One meaning of shang is "above" or "on top of". A compound is "mashang" which means literally "on horseback" but is used figuratively to mean "immediately". You need a very good dictionary with compounds in it and looking up the dictionary needs a longest-match approach. Compounding is rife in German (famous example is something like "Danube steam navigation company director's wife" being expressed as one word), Turkic languages, Finnish, and Magyar -- these languages have very long words many of which won't be found in a dictionary and need breaking down to understand them.
Your problem is one of linguistics, nothing to do with Python.