Python - Split a string but keep contiguous uppercase letters [duplicate]

一个人想着一个人 提交于 2020-12-08 05:14:39

问题


I would like to split strings to separate words by capital letters, but if it contains contiguous uppercase letters, split it to one word until before the final letter (that probably starts a new word..)

For example:

splitThreeWords -> [split, three, words]
SplitThreeWords -> [split, three, words]
ILOSummit -> [ILO, summit]

回答1:


Use re.split with a capturing group (to keep the splitter pattern) and filter out the empty chunks:

import re


def split_by_title_word(s):
    return [chunk for chunk in re.split(r"([A-Z][a-z]+)", s) if chunk]


print(split_by_title_word("splitThreeWords"))
print(split_by_title_word("SplitThreeWords"))
print(split_by_title_word("IOLSummit"))

Output

['split', 'Three', 'Words']
['Split', 'Three', 'Words']
['IOL', 'Summit']

The pattern [A-Z][a-z]+ represents a title word.



来源:https://stackoverflow.com/questions/64784769/python-split-a-string-but-keep-contiguous-uppercase-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!