问题
Let's say I have a code as such:
import re
docid_re = re.compile(r'<DOCID>([^>]+)</DOCID>')
doctype_re = re.compile(r'<DOCTYPE SOURCE="[^"]+">([^>]+)</DOCTYPE>')
datetime_re = re.compile(r'<DATETIME>([^>]+)</DATETIME>')
I could also do this:
>>> import re
>>> docid_re = r'<DOCID>([^>]+)</DOCID>'
>>> doctype_re = r'<DOCTYPE SOURCE="[^"]+">([^>]+)</DOCTYPE>'
>>> datetime_re = r'<DATETIME>([^>]+)</DATETIME>'
>>> docid_re, doctype_re, datetime_re = map(re.compile, [docid_re, doctype_re, datetime_re])
>>> docid_re
<_sre.SRE_Pattern object at 0x7f0314eee438>
But is there any real gain in speed or memory when I use the map()
?
回答1:
Do not listen anybody - just measure it! You can use timeit module for it. But remember, that "premature optimization is the root of all evil" (c) Donald Knuth.
Btw, answer on your question "No, it doesn't help at all".
回答2:
If you were compiling a lot of regexes, map
might help by avoiding lookup costs involved in finding re
, then getting its compile
attribute each call; with map
, you look up map
once and re.compile
once, and then it gets used over and over without further lookups. Of course, when you need to construct a list
to use it, you eat into that savings. Practically speaking, you'd need an awful lot of regexes to reach the point where map
would be worth your while; for three, it's probably a loss.
Even when it did help, it would be the tiniest of microoptimizations. I would do it if it made the code cleaner, performance is a tertiary concern here at best. There are cases (say, parsing a huge text file of integers into int
s) where map
can be a big win because the overhead of starting it up is compensated for by the reduced lookup and Python byte code execution overhead. But this is not one of those cases, and those cases are so rare as to not be worth worrying about 99.99% of the time.
来源:https://stackoverflow.com/questions/32873378/how-to-compile-multiple-multiple-regexes-in-one-go-is-it-more-efficient-pyth