Python: any way to perform this “hybrid” split() on multi-lingual (e.g. Chinese & English) strings?
I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part into words using whitespace as separator, and to separate the Chinese/Japanese/Korean part into individual characters. And I want to put of all those separated components into a list. Some examples would probably make this clear: Case 1 : English-only string. This case is easy: >>> "I love Python".split() ['I', 'love', 'Python'] Case 2 : Chinese-only