Is there any benefit in using compile for regular expressions in Python?
h = re.compile(\'hello\')
h.match(\'hello world\')
vs
i'd like to motivate that pre-compiling is both conceptually and 'literately' (as in 'literate programming') advantageous. have a look at this code snippet:
from re import compile as _Re
class TYPO:
def text_has_foobar( self, text ):
return self._text_has_foobar_re_search( text ) is not None
_text_has_foobar_re_search = _Re( r"""(?i)foobar""" ).search
TYPO = TYPO()
in your application, you'd write:
from TYPO import TYPO
print( TYPO.text_has_foobar( 'FOObar ) )
this is about as simple in terms of functionality as it can get. because this is example is so short, i conflated the way to get _text_has_foobar_re_search
all in one line. the disadvantage of this code is that it occupies a little memory for whatever the lifetime of the TYPO
library object is; the advantage is that when doing a foobar search, you'll get away with two function calls and two class dictionary lookups. how many regexes are cached by re
and the overhead of that cache are irrelevant here.
compare this with the more usual style, below:
import re
class Typo:
def text_has_foobar( self, text ):
return re.compile( r"""(?i)foobar""" ).search( text ) is not None
In the application:
typo = Typo()
print( typo.text_has_foobar( 'FOObar ) )
I readily admit that my style is highly unusual for python, maybe even debatable. however, in the example that more closely matches how python is mostly used, in order to do a single match, we must instantiate an object, do three instance dictionary lookups, and perform three function calls; additionally, we might get into re
caching troubles when using more than 100 regexes. also, the regular expression gets hidden inside the method body, which most of the time is not such a good idea.
be it said that every subset of measures---targeted, aliased import statements; aliased methods where applicable; reduction of function calls and object dictionary lookups---can help reduce computational and conceptual complexity.