Is there a way to really pickle compiled regular expressions in python?

后端 未结 7 1675
一整个雨季
一整个雨季 2020-12-13 15:00

I have a python console application that contains 300+ regular expressions. The set of regular expressions is fixed for each release. When users run the app, the entire se

7条回答
  •  遥遥无期
    2020-12-13 15:26

    OK, this isn't pretty, but it might be what you want. I looked at the sre_compile.py module from Python 2.6, and ripped out a bit of it, chopped it in half, and used the two pieces to pickle and unpickle compiled regexes:

    import re, sre_compile, sre_parse, _sre
    import cPickle as pickle
    
    # the first half of sre_compile.compile    
    def raw_compile(p, flags=0):
        # internal: convert pattern list to internal format
    
        if sre_compile.isstring(p):
            pattern = p
            p = sre_parse.parse(p, flags)
        else:
            pattern = None
    
        code = sre_compile._code(p, flags)
    
        return p, code
    
    # the second half of sre_compile.compile
    def build_compiled(pattern, p, flags, code):
        # print code
    
        # XXX:  get rid of this limitation!
        if p.pattern.groups > 100:
            raise AssertionError(
                "sorry, but this version only supports 100 named groups"
                )
    
        # map in either direction
        groupindex = p.pattern.groupdict
        indexgroup = [None] * p.pattern.groups
        for k, i in groupindex.items():
            indexgroup[i] = k
    
        return _sre.compile(
            pattern, flags | p.pattern.flags, code,
            p.pattern.groups-1,
            groupindex, indexgroup
            )
    
    def pickle_regexes(regexes):
        picklable = []
        for r in regexes:
            p, code = raw_compile(r, re.DOTALL)
            picklable.append((r, p, code))
        return pickle.dumps(picklable)
    
    def unpickle_regexes(pkl):
        regexes = []
        for r, p, code in pickle.loads(pkl):
            regexes.append(build_compiled(r, p, re.DOTALL, code))
        return regexes
    
    regexes = [
        r"^$",
        r"a*b+c*d+e*f+",
        ]
    
    pkl = pickle_regexes(regexes)
    print pkl
    print unpickle_regexes(pkl)
    

    I don't really know if this works, or if it speeds things up. I know it prints a list of regexes when I try it. It might be very specific to version 2.6, I also don't know that.

提交回复
热议问题