Finding Acronyms Using Regex In Python

流过昼夜 提交于 2019-12-02 02:52:24

问题


I'm trying to use regex in Python to match acronyms separated by periods. I have the following code:

import re
test_string = "U.S.A."
pattern = r'([A-Z]\.)+'
print re.findall(pattern, test_string)

The result of this is:

['A.']

I'm confused as to why this is the result. I know + is greedy, but why is are the first occurrences of [A-Z]\. ignored?


回答1:


The (...) in regex creates a group. I suggest changing to:

pattern = r'(?:[A-Z]\.)+'



回答2:


Description

This regex will:

  • capture all the acronyms like U.S.A. in a sentence
  • avoids matching uppercase words at the end of a sentence

(?:(?<=\.|\s)[A-Z]\.)+

Example

Live Example: http://www.rubular.com/r/9bslFxvfzQ

Sample Text

This is the U.S.A. we have RADAR.

Matches

U.S.A


来源:https://stackoverflow.com/questions/17779771/finding-acronyms-using-regex-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!