问题
I want to store a series of pre-tested regexes in a config file, and read and apply them at runtime.
However, because they're commonly packed with escape characters, by the time I've loaded them up into memory, and populated them into a dict, they've been escaped to death.
How can I preserve the integrity of my regex definitions, so that they will re.compile
?
Alternately, given that many of the post-escape strings end up in a form with \x00
characters, how do I convert these back into a form that will be consumed correctly by re.compile
?
e.g. I have written in a file, the regex "\btest\b"
. If I want to put this into a re.compile, I can force it to do so with re.compile(r"\btest\b")
. However, I don't want to write this code by hand, I want to lift it from a file, and process it as a variable (I've got 000's of these to deal with here) .
There doesn't seem to be a way to r
a string variable, and so I'm left trying to compile with '\x08test\x08'
, which doesn't do what I want it to.
This must be a fairly regular issue - how do others deal with this problem?
回答1:
Like the comment says, there is no need to do anything special.
Imagine a utf-8 encoded text file named regexps.txt
with one regex on each line, then creating a list of compiled regexps from that file would be something like:
with open('regexps.txt', encoding='utf8') as f:
compiled_regexps = [re.compile(line) for line in f]
来源:https://stackoverflow.com/questions/52991259/clean-method-for-storing-regex-strings-in-python