Clean method for storing regex strings in python

你离开我真会死。 提交于 2021-01-28 04:40:02

问题


I want to store a series of pre-tested regexes in a config file, and read and apply them at runtime.

However, because they're commonly packed with escape characters, by the time I've loaded them up into memory, and populated them into a dict, they've been escaped to death.

How can I preserve the integrity of my regex definitions, so that they will re.compile?

Alternately, given that many of the post-escape strings end up in a form with \x00 characters, how do I convert these back into a form that will be consumed correctly by re.compile?

e.g. I have written in a file, the regex "\btest\b". If I want to put this into a re.compile, I can force it to do so with re.compile(r"\btest\b"). However, I don't want to write this code by hand, I want to lift it from a file, and process it as a variable (I've got 000's of these to deal with here) .

There doesn't seem to be a way to r a string variable, and so I'm left trying to compile with '\x08test\x08', which doesn't do what I want it to.

This must be a fairly regular issue - how do others deal with this problem?


回答1:


Like the comment says, there is no need to do anything special.

Imagine a utf-8 encoded text file named regexps.txt with one regex on each line, then creating a list of compiled regexps from that file would be something like:

with open('regexps.txt', encoding='utf8') as f:
    compiled_regexps = [re.compile(line) for line in f]


来源:https://stackoverflow.com/questions/52991259/clean-method-for-storing-regex-strings-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!