Python: any way to perform this “hybrid” split() on multi-lingual (e.g. Chinese & English) strings?

后端未结

关注

 5  1689

后悔当初 2021-01-31 23:06

I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don\'t (Chinese, Japanese, Korean

5条回答

暖寄归人 (楼主)

2021-01-31 23:26
In Python 3, it also splits the number if you needed.
```
def spliteKeyWord(str):
    regex = r"[\u4e00-\ufaff]|[0-9]+|[a-zA-Z]+\'*[a-z]*"
    matches = re.findall(regex, str, re.UNICODE)
    return matches

print(spliteKeyWord("Testing English text我爱Python123"))
```
=> ['Testing', 'English', 'text', '我', '爱', 'Python', '123']
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...