How to compile multiple multiple regexes in one go? Is it more efficient? - python

问题

Let's say I have a code as such:

import re
docid_re = re.compile(r'<DOCID>([^>]+)</DOCID>')
doctype_re = re.compile(r'<DOCTYPE SOURCE="[^"]+">([^>]+)</DOCTYPE>')
datetime_re = re.compile(r'<DATETIME>([^>]+)</DATETIME>')

I could also do this:

>>> import re
>>> docid_re = r'<DOCID>([^>]+)</DOCID>'
>>> doctype_re = r'<DOCTYPE SOURCE="[^"]+">([^>]+)</DOCTYPE>'
>>> datetime_re = r'<DATETIME>([^>]+)</DATETIME>'
>>> docid_re, doctype_re, datetime_re = map(re.compile, [docid_re, doctype_re, datetime_re])
>>> docid_re
<_sre.SRE_Pattern object at 0x7f0314eee438>

But is there any real gain in speed or memory when I use the map()?

回答1:

Do not listen anybody - just measure it! You can use timeit module for it. But remember, that "premature optimization is the root of all evil" (c) Donald Knuth.

Btw, answer on your question "No, it doesn't help at all".

回答2:

If you were compiling a lot of regexes, map might help by avoiding lookup costs involved in finding re, then getting its compile attribute each call; with map, you look up map once and re.compile once, and then it gets used over and over without further lookups. Of course, when you need to construct a list to use it, you eat into that savings. Practically speaking, you'd need an awful lot of regexes to reach the point where map would be worth your while; for three, it's probably a loss.

Even when it did help, it would be the tiniest of microoptimizations. I would do it if it made the code cleaner, performance is a tertiary concern here at best. There are cases (say, parsing a huge text file of integers into ints) where map can be a big win because the overhead of starting it up is compensated for by the reduced lookup and Python byte code execution overhead. But this is not one of those cases, and those cases are so rare as to not be worth worrying about 99.99% of the time.

来源：https://stackoverflow.com/questions/32873378/how-to-compile-multiple-multiple-regexes-in-one-go-is-it-more-efficient-pyth

标签

python

regex

dictionary

compilation