问题
I'm trying to use regex in Python to match acronyms separated by periods. I have the following code:
import re
test_string = "U.S.A."
pattern = r'([A-Z]\.)+'
print re.findall(pattern, test_string)
The result of this is:
['A.']
I'm confused as to why this is the result. I know + is greedy, but why is are the first occurrences of [A-Z]\. ignored?
回答1:
The (...)
in regex creates a group. I suggest changing to:
pattern = r'(?:[A-Z]\.)+'
回答2:
Description
This regex will:
- capture all the acronyms like
U.S.A.
in a sentence - avoids matching uppercase words at the end of a sentence
(?:(?<=\.|\s)[A-Z]\.)+
Example
Live Example: http://www.rubular.com/r/9bslFxvfzQ
Sample Text
This is the U.S.A. we have RADAR.
Matches
U.S.A
来源:https://stackoverflow.com/questions/17779771/finding-acronyms-using-regex-in-python