re.findall which returns a dict of named capturing groups?

后端 未结 4 1117
忘了有多久
忘了有多久 2020-12-04 15:12

Inspired by a now-deleted question; given a regex with named groups, is there a method like findall which returns a list of dict with the named cap

相关标签:
4条回答
  • 2020-12-04 15:54

    If you are using match :

    r = re.match('(?P<name>[a-z]+)\s+(?P<name2>[a-z]+)', text)
    r.groupdict()
    

    documentation here

    0 讨论(0)
  • 2020-12-04 16:01
    >>> import re
    >>> s = "bob sue jon richard harry"
    >>> r = re.compile('(?P<name>[a-z]+)\s+(?P<name2>[a-z]+)')
    >>> [m.groupdict() for m in r.finditer(s)]
    [{'name2': 'sue', 'name': 'bob'}, {'name2': 'richard', 'name': 'jon'}]
    
    0 讨论(0)
  • 2020-12-04 16:01

    There's no built-in method for doing this, but the expected result can be achieved by using list comprehensions.

    [dict([[k, i if isinstance(i, str) else i[v-1]] for k,v in pat.groupindex.items()]) for i in pat.findall(text)]
    

    With friendly formatting:

    >>> [
    ...     dict([
    ...         [k, i if isinstance(i, str) else i[v-1]]
    ...         for k,v in pat.groupindex.items()
    ...     ])
    ...     for i in pat.findall(text)
    ... ]
    

    We construct a list using a list comprehension, iterate over the result from findall which is either a list of strings or a list of tuples (0 or 1 capturing groups result in a list of str).

    For each item in the result we construct a dict from another list comprehension which is generated from the groupindex field of the compiled pattern, which looks like:

    >>> pat.groupindex
    {'name2': 2, 'name': 1}
    

    A list is constructed for each item in the groupindex and if the item from findall was a tuple, the group number from groupindex is used to find the correct item, otherwise the item is assigned to the (only extant) named group.

    [k, i if isinstance(i, str) else i[v-1]]
    

    Finally, a dict is constructed from the list of lists of strings.

    Note that groupindex contains only named groups, so non-named capturing groups will be omitted from the resulting dict.

    And the result:

    [dict([[k, i if isinstance(i, str) else i[v-1]] for k,v in pat.groupindex.items()])  for i in pat.findall(text)]
    [{'name2': 'sue', 'name': 'bob'}, {'name2': 'richard', 'name': 'jon'}]
    
    0 讨论(0)
  • 2020-12-04 16:11

    you could switch to finditer

    >>> import re
    >>> text = "bob sue jon richard harry"
    >>> pat = re.compile('(?P<name>[a-z]+)\s+(?P<name2>[a-z]+)')
    >>> for m in pat.finditer(text):
    ...     print m.groupdict()
    ... 
    {'name2': 'sue', 'name': 'bob'}
    {'name2': 'richard', 'name': 'jon'}
    
    0 讨论(0)
提交回复
热议问题