How do you translate this regular-expression idiom from Perl into Python?

后端 未结 15 619
情深已故
情深已故 2020-12-04 11:16

I switched from Perl to Python about a year ago and haven\'t looked back. There is only one idiom that I\'ve ever found I can do more easily in Perl than in Python:<

相关标签:
15条回答
  • 2020-12-04 11:49
    r"""
    This is an extension of the re module. It stores the last successful
    match object and lets you access it's methods and attributes via
    this module.
    
    This module exports the following additional functions:
        expand  Return the string obtained by doing backslash substitution on a
                template string.
        group   Returns one or more subgroups of the match.
        groups  Return a tuple containing all the subgroups of the match.
        start   Return the indices of the start of the substring matched by
                group.
        end     Return the indices of the end of the substring matched by group.
        span    Returns a 2-tuple of (start(), end()) of the substring matched
                by group.
    
    This module defines the following additional public attributes:
        pos         The value of pos which was passed to the search() or match()
                    method.
        endpos      The value of endpos which was passed to the search() or
                    match() method.
        lastindex   The integer index of the last matched capturing group.
        lastgroup   The name of the last matched capturing group.
        re          The regular expression object which as passed to search() or
                    match().
        string      The string passed to match() or search().
    """
    
    import re as re_
    
    from re import *
    from functools import wraps
    
    __all__ = re_.__all__ + [ "expand", "group", "groups", "start", "end", "span",
            "last_match", "pos", "endpos", "lastindex", "lastgroup", "re", "string" ]
    
    last_match = pos = endpos = lastindex = lastgroup = re = string = None
    
    def _set_match(match=None):
        global last_match, pos, endpos, lastindex, lastgroup, re, string
        if match is not None:
            last_match = match
            pos = match.pos
            endpos = match.endpos
            lastindex = match.lastindex
            lastgroup = match.lastgroup
            re = match.re
            string = match.string
        return match
    
    @wraps(re_.match)
    def match(pattern, string, flags=0):
        return _set_match(re_.match(pattern, string, flags))
    
    
    @wraps(re_.search)
    def search(pattern, string, flags=0):
        return _set_match(re_.search(pattern, string, flags))
    
    @wraps(re_.findall)
    def findall(pattern, string, flags=0):
        matches = re_.findall(pattern, string, flags)
        if matches:
            _set_match(matches[-1])
        return matches
    
    @wraps(re_.finditer)
    def finditer(pattern, string, flags=0):
        for match in re_.finditer(pattern, string, flags):
            yield _set_match(match)
    
    def expand(template):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.expand(template)
    
    def group(*indices):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.group(*indices)
    
    def groups(default=None):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.groups(default)
    
    def groupdict(default=None):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.groupdict(default)
    
    def start(group=0):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.start(group)
    
    def end(group=0):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.end(group)
    
    def span(group=0):
        if last_match is None:
            raise TypeError, "No successful match yet."
        return last_match.span(group)
    
    del wraps  # Not needed past module compilation
    

    For example:

    if gre.match("foo(.+)", var):
      # do something with gre.group(1)
    elif gre.match("bar(.+)", var):
      # do something with gre.group(1)
    elif gre.match("baz(.+)", var):
      # do something with gre.group(1)
    
    0 讨论(0)
  • 2020-12-04 11:52

    how about using a dictionary?

    match_objects = {}
    
    if match_objects.setdefault( 'mo_foo', re_foo.search( text ) ):
      # do something with match_objects[ 'mo_foo' ]
    
    elif match_objects.setdefault( 'mo_bar', re_bar.search( text ) ):
      # do something with match_objects[ 'mo_bar' ]
    
    elif match_objects.setdefault( 'mo_baz', re_baz.search( text ) ):
      # do something with match_objects[ 'mo_baz' ]
    
    ...
    

    however, you must ensure there are no duplicate match_objects dictionary keys ( mo_foo, mo_bar, ... ), best by giving each regular expression its own name and naming the match_objects keys accordingly, otherwise match_objects.setdefault() method would return existing match object instead of creating new match object by running re_xxx.search( text ).

    0 讨论(0)
  • 2020-12-04 11:56

    Alternatively, something not using regular expressions at all:

    prefix, data = var[:3], var[3:]
    if prefix == 'foo':
        # do something with data
    elif prefix == 'bar':
        # do something with data
    elif prefix == 'baz':
        # do something with data
    else:
        # do something with var
    

    Whether that is suitable depends on your actual problem. Don't forget, regular expressions aren't the swiss army knife that they are in Perl; Python has different constructs for doing string manipulation.

    0 讨论(0)
  • 2020-12-04 11:56
    def find_first_match(string, *regexes):
        for regex, handler in regexes:
            m = re.search(regex, string):
            if m:
                handler(m)
                return
        else:
            raise ValueError
    
    find_first_match(
        foo, 
        (r'foo(.+)', handle_foo), 
        (r'bar(.+)', handle_bar), 
        (r'baz(.+)', handle_baz))
    

    To speed it up, one could turn all regexes into one internally and create the dispatcher on the fly. Ideally, this would be turned into a class then.

    0 讨论(0)
  • 2020-12-04 11:57

    My solution would be:

    import re
    
    class Found(Exception): pass
    
    try:        
        for m in re.finditer('bar(.+)', var):
            # Do something
            raise Found
    
        for m in re.finditer('foo(.+)', var):
            # Do something else
            raise Found
    
    except Found: pass
    
    0 讨论(0)
  • 2020-12-04 11:58

    A minimalist DataHolder:

    class Holder(object):
        def __call__(self, *x):
            if x:
                self.x = x[0]
            return self.x
    
    data = Holder()
    
    if data(re.search('foo (\d+)', string)):
        print data().group(1)
    

    or as a singleton function:

    def data(*x):
        if x:
            data.x = x[0]
        return data.x
    
    0 讨论(0)
提交回复
热议问题