问题
I have a regex that looks like:
rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
Getting the matched string is no problem m.group(name). However, I need to extract the name and span of the matched groups (or even just the span by name) and haven't found a way to do this. I would like to do something like:
p = re.compile(p, re.IGNORECASE)
m = p.match(targetstring)
#then do something to set 'all' to the list of match objects
for mo in all
print mo.name() + '->' + mo.span()
So for example the input string 'ABCDEFHIJK' should generate the output:
'foo' -> (0, 3)
'bar' -> (3, 6)
'norf' -> (6, 10)
Thanks!
回答1:
You iterate over the names of the matched groups (the keys of groupdict) and print the corresponding span attribute:
rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
p = re.compile(rgx, re.IGNORECASE)
m = re.match(p, 'ABCDEFHIJKLM')
for key in m.groupdict():
print key, m.span(key)
This prints:
foo (0, 3)
bar (3, 6)
norf (6, 10)
Edit: Since the keys of a dictionary are unordered, you may wish to explicitly choose the order in which the keys are iterated over. In the example below, sorted(...) is a list of the group names sorted by the corresponding dictionary value (the span tuple):
for key in sorted(m.groupdict().keys(), key=m.groupdict().get):
print key, m.span(key)
回答2:
You can use RegexObject.groupindex:
p = re.compile(rgx, re.IGNORECASE)
m = p.match('ABCDEFHIJK')
for name, n in sorted(m.re.groupindex.items(), key=lambda x: x[1]):
print name, m.group(n), m.span(n)
来源:https://stackoverflow.com/questions/26465299/extract-the-name-and-span-of-regex-matched-groups