Regex for non-consecutive upper-case words PART DEUX

牧云@^-^@ 提交于 2019-12-24 19:16:04

问题


Very many thanks to all those who answered part 1 of this question see here

The regex that worked for me was

(?<![A-Z]\s)\b[A-Z]+\b(?!\s[A-Z])

The question now is how to do the inverse, i.e. given the string

This is a different sentence WITH a few CAPITAL WORDS here AND THERE ACROSS multiple LINES.

How to match "CAPITAL WORDS" and "AND THERE ACROSS" but not match "WITH" or "LINES" as they are isolated with lower case words either side, or they could be at the end of the start of a sentence.

I tried changing from negative to positive lookarounds and altering the [A-Z] to [a-z] but again failed spectacularly

Any help would be much appreciated once again.


回答1:


At least two consecutive upper-case words:

 [A-Z]{2,}(?:\s+[A-Z]{2,})+

 [A-Z]{2,}           # first word (At least two letters)
 (?:                 # do not capture this group
    \s+[A-Z]{2,}     #                 (whitespace and a word)
 )+                  # one or more of / 

In [52]: re.findall(r'[A-Z]{2,}(?:\s+[A-Z]{2,})+', 'CAPITAL Words This is a different sentence WITH a few CAPITAL\nWORDS here AND THERE ACROSS multiple LINES.')
Out[52]: ['CAPITAL\nWORDS', 'AND THERE ACROSS']


来源:https://stackoverflow.com/questions/20542191/regex-for-non-consecutive-upper-case-words-part-deux

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!