Hashtable/dictionary/map lookup with regular expressions

前端 未结 19 1643
难免孤独
难免孤独 2021-02-01 05:36

I\'m trying to figure out if there\'s a reasonably efficient way to perform a lookup in a dictionary (or a hash, or a map, or whatever your favorite language calls it) where the

19条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-01 06:27

    Here's an efficient way to do it by combining the keys into a single compiled regexp, and so not requiring any looping over key patterns. It abuses the lastindex to find out which key matched. (It's a shame regexp libraries don't let you tag the terminal state of the DFA that a regexp is compiled to, or this would be less of a hack.)

    The expression is compiled once, and will produce a fast matcher that doesn't have to search sequentially. Common prefixes are compiled together in the DFA, so each character in the key is matched once, not many times, unlike some of the other suggested solutions. You're effectively compiling a mini lexer for your keyspace.

    This map isn't extensible (can't define new keys) without recompiling the regexp, but it can be handy for some situations.

    # Regular expression map
    # Abuses match.lastindex to figure out which key was matched
    # (i.e. to emulate extracting the terminal state of the DFA of the regexp engine)
    # Mostly for amusement.
    # Richard Brooksby, Ravenbrook Limited, 2013-06-01
    
    import re
    
    class ReMap(object):
    
        def __init__(self, items):
            if not items:
                items = [(r'epsilon^', None)] # Match nothing
            key_patterns = []
            self.lookup = {}
            index = 1
            for key, value in items:
                # Ensure there are no capturing parens in the key, because
                # that would mess up match.lastindex
                key_patterns.append('(' + re.sub(r'\((?!\?:)', '(?:', key) + ')')
                self.lookup[index] = value
                index += 1
            self.keys_re = re.compile('|'.join(key_patterns))
    
        def __getitem__(self, key):
            m = self.keys_re.match(key)
            if m:
                return self.lookup[m.lastindex]
            raise KeyError(key)
    
    if __name__ == '__main__':
        remap = ReMap([(r'foo.', 12), (r'FileN.*', 35)])
        print remap['food']
        print remap['foot in my mouth']
        print remap['FileNotFoundException: file.x does not exist']
    

提交回复
热议问题