Clean method for storing regex strings in python

问题

I want to store a series of pre-tested regexes in a config file, and read and apply them at runtime.

However, because they're commonly packed with escape characters, by the time I've loaded them up into memory, and populated them into a dict, they've been escaped to death.

How can I preserve the integrity of my regex definitions, so that they will re.compile?

Alternately, given that many of the post-escape strings end up in a form with \x00 characters, how do I convert these back into a form that will be consumed correctly by re.compile?

e.g. I have written in a file, the regex "\btest\b". If I want to put this into a re.compile, I can force it to do so with re.compile(r"\btest\b"). However, I don't want to write this code by hand, I want to lift it from a file, and process it as a variable (I've got 000's of these to deal with here) .

There doesn't seem to be a way to r a string variable, and so I'm left trying to compile with '\x08test\x08', which doesn't do what I want it to.

This must be a fairly regular issue - how do others deal with this problem?

回答1:

Like the comment says, there is no need to do anything special.

Imagine a utf-8 encoded text file named regexps.txt with one regex on each line, then creating a list of compiled regexps from that file would be something like:

with open('regexps.txt', encoding='utf8') as f:
    compiled_regexps = [re.compile(line) for line in f]

来源：https://stackoverflow.com/questions/52991259/clean-method-for-storing-regex-strings-in-python

标签

python

regex

escaping