Splitting on group of capital letters in python

半城伤御伤魂 提交于 2021-02-16 14:10:54

问题


I'm trying to tokenize a number of strings using a capital letter as a delimited. I have landed on the following code:

token = ([a for a in re.split(r'([A-Z][a-z]*)', "ABCowDog") if a])

print token

And I get this, as expected, in return:

['A', 'B', 'Cow', 'Dog']

Now, this is just an example string to make life easier, but in my case I want to go through this list and find individual characters (easy enough with checking len()) and putting the individual letters together, provided they meet a prior definition. In the example above the strings 'AB', 'Cow', and 'Dog' are the strings I actually want to form (consecutive capitals are part of an acronym). For whatever reason, once I have my token, I am unable to figure out how to walk the list. Sorry if this is a simple answer, but I'm fairly new to python and am sick of banging my head against the wall.


回答1:


re.split isn't always easy to use and seems sometimes limited in many situations. You can try a different approach with re.findall:

>>> s = 'ABCowDog'
>>> re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)
['AB', 'Cow', 'Dog']



回答2:


You can use the following to split with regex module:

(?=[A-Z][a-z])

See DEMO

Code:

regex.split(r'(?=[A-Z][a-z])', "ABCowDog",flags=regex.VERSION1)



回答3:


([A-Z][a-z]+)

You should split by this.



来源:https://stackoverflow.com/questions/30598511/splitting-on-group-of-capital-letters-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!