问题
I would like to split strings to separate words by capital letters, but if it contains contiguous uppercase letters, split it to one word until before the final letter (that probably starts a new word..)
For example:
splitThreeWords -> [split, three, words]
SplitThreeWords -> [split, three, words]
ILOSummit -> [ILO, summit]
回答1:
Use re.split with a capturing group (to keep the splitter pattern) and filter out the empty chunks:
import re
def split_by_title_word(s):
return [chunk for chunk in re.split(r"([A-Z][a-z]+)", s) if chunk]
print(split_by_title_word("splitThreeWords"))
print(split_by_title_word("SplitThreeWords"))
print(split_by_title_word("IOLSummit"))
Output
['split', 'Three', 'Words']
['Split', 'Three', 'Words']
['IOL', 'Summit']
The pattern [A-Z][a-z]+
represents a title word.
来源:https://stackoverflow.com/questions/64784769/python-split-a-string-but-keep-contiguous-uppercase-letters