I have a python console application that contains 300+ regular expressions. The set of regular expressions is fixed for each release. When users run the app, the entire se
OK, this isn't pretty, but it might be what you want. I looked at the sre_compile.py module from Python 2.6, and ripped out a bit of it, chopped it in half, and used the two pieces to pickle and unpickle compiled regexes:
import re, sre_compile, sre_parse, _sre
import cPickle as pickle
# the first half of sre_compile.compile
def raw_compile(p, flags=0):
# internal: convert pattern list to internal format
if sre_compile.isstring(p):
pattern = p
p = sre_parse.parse(p, flags)
else:
pattern = None
code = sre_compile._code(p, flags)
return p, code
# the second half of sre_compile.compile
def build_compiled(pattern, p, flags, code):
# print code
# XXX: get rid of this limitation!
if p.pattern.groups > 100:
raise AssertionError(
"sorry, but this version only supports 100 named groups"
)
# map in either direction
groupindex = p.pattern.groupdict
indexgroup = [None] * p.pattern.groups
for k, i in groupindex.items():
indexgroup[i] = k
return _sre.compile(
pattern, flags | p.pattern.flags, code,
p.pattern.groups-1,
groupindex, indexgroup
)
def pickle_regexes(regexes):
picklable = []
for r in regexes:
p, code = raw_compile(r, re.DOTALL)
picklable.append((r, p, code))
return pickle.dumps(picklable)
def unpickle_regexes(pkl):
regexes = []
for r, p, code in pickle.loads(pkl):
regexes.append(build_compiled(r, p, re.DOTALL, code))
return regexes
regexes = [
r"^$",
r"a*b+c*d+e*f+",
]
pkl = pickle_regexes(regexes)
print pkl
print unpickle_regexes(pkl)
I don't really know if this works, or if it speeds things up. I know it prints a list of regexes when I try it. It might be very specific to version 2.6, I also don't know that.