Is there a way to really pickle compiled regular expressions in python?

后端 未结 7 1673
一整个雨季
一整个雨季 2020-12-13 15:00

I have a python console application that contains 300+ regular expressions. The set of regular expressions is fixed for each release. When users run the app, the entire se

7条回答
  •  一生所求
    2020-12-13 15:27

    As others have mentioned, you can simply pickle the compiled regex. They will pickle and unpickle just fine, and be usable. However, it doesn't look like the pickle actually contains the result of compilation. I suspect you will incur the compilation overhead again when you use the result of the unpickling.

    >>> p.dumps(re.compile("a*b+c*"))
    "cre\n_compile\np1\n(S'a*b+c*'\np2\nI0\ntRp3\n."
    >>> p.dumps(re.compile("a*b+c*x+y*"))
    "cre\n_compile\np1\n(S'a*b+c*x+y*'\np2\nI0\ntRp3\n."
    

    In these two tests, you can see the only difference between the two pickles is in the string. Apparently compiled regexes don't pickle the compiled bits, just the string needed to compile it again.

    But I'm wondering about your application overall: compiling a regex is a fast operation, how short are your jobs that compiling the regex is significant? One possibility is that you are compiling all 300 regexes, and then only using one for a short job. In that case, don't compile them all up front. The re module is very good at using cached copies of compiled regexes, so you generally don't have to compile them yourself, just use the string form. The re module will lookup the string in a dictionary of compiled regexes, so grabbing the compiled form yourself only saves you a dictionary look up. I may be totally off-base, sorry if so.

提交回复
热议问题