Is it worth using Python's re.compile?

前端 未结 26 2100
旧时难觅i
旧时难觅i 2020-11-22 12:51

Is there any benefit in using compile for regular expressions in Python?

h = re.compile(\'hello\')
h.match(\'hello world\')

vs



        
26条回答
  •  没有蜡笔的小新
    2020-11-22 13:19

    i'd like to motivate that pre-compiling is both conceptually and 'literately' (as in 'literate programming') advantageous. have a look at this code snippet:

    from re import compile as _Re
    
    class TYPO:
    
      def text_has_foobar( self, text ):
        return self._text_has_foobar_re_search( text ) is not None
      _text_has_foobar_re_search = _Re( r"""(?i)foobar""" ).search
    
    TYPO = TYPO()
    

    in your application, you'd write:

    from TYPO import TYPO
    print( TYPO.text_has_foobar( 'FOObar ) )
    

    this is about as simple in terms of functionality as it can get. because this is example is so short, i conflated the way to get _text_has_foobar_re_search all in one line. the disadvantage of this code is that it occupies a little memory for whatever the lifetime of the TYPO library object is; the advantage is that when doing a foobar search, you'll get away with two function calls and two class dictionary lookups. how many regexes are cached by re and the overhead of that cache are irrelevant here.

    compare this with the more usual style, below:

    import re
    
    class Typo:
    
      def text_has_foobar( self, text ):
        return re.compile( r"""(?i)foobar""" ).search( text ) is not None
    

    In the application:

    typo = Typo()
    print( typo.text_has_foobar( 'FOObar ) )
    

    I readily admit that my style is highly unusual for python, maybe even debatable. however, in the example that more closely matches how python is mostly used, in order to do a single match, we must instantiate an object, do three instance dictionary lookups, and perform three function calls; additionally, we might get into re caching troubles when using more than 100 regexes. also, the regular expression gets hidden inside the method body, which most of the time is not such a good idea.

    be it said that every subset of measures---targeted, aliased import statements; aliased methods where applicable; reduction of function calls and object dictionary lookups---can help reduce computational and conceptual complexity.

提交回复
热议问题