Extract the Name and Span of Regex Matched Groups

问题

I have a regex that looks like:

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'

Getting the matched string is no problem m.group(name). However, I need to extract the name and span of the matched groups (or even just the span by name) and haven't found a way to do this. I would like to do something like:

p = re.compile(p, re.IGNORECASE)
m = p.match(targetstring)
#then do something to set 'all' to the list of match objects
for mo in all
   print mo.name() + '->' + mo.span()

So for example the input string 'ABCDEFHIJK' should generate the output:

'foo'  -> (0, 3)
'bar'  -> (3, 6)
'norf' -> (6, 10)

Thanks!

回答1:

You iterate over the names of the matched groups (the keys of groupdict) and print the corresponding span attribute:

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
p = re.compile(rgx, re.IGNORECASE)
m = re.match(p, 'ABCDEFHIJKLM')

for key in m.groupdict():
    print key, m.span(key)

This prints:

foo (0, 3)
bar (3, 6)
norf (6, 10)

Edit: Since the keys of a dictionary are unordered, you may wish to explicitly choose the order in which the keys are iterated over. In the example below, sorted(...) is a list of the group names sorted by the corresponding dictionary value (the span tuple):

for key in sorted(m.groupdict().keys(), key=m.groupdict().get):
    print key, m.span(key)

回答2:

You can use RegexObject.groupindex:

p = re.compile(rgx, re.IGNORECASE)
m = p.match('ABCDEFHIJK')

for name, n in sorted(m.re.groupindex.items(), key=lambda x: x[1]):
    print name, m.group(n), m.span(n)

来源：https://stackoverflow.com/questions/26465299/extract-the-name-and-span-of-regex-matched-groups

标签

python

regex