Python: speed for “in” vs regular expression [duplicate]

試著忘記壹切 提交于 2021-01-29 10:15:17

问题


When determining whether an instance of substring exists in a larger string,

I am considering two options:

(1)

if "aaaa" in "bbbaaaaaabbb":
    dosomething()

(2)

pattern = re.compile("aaaa")
if pattern.search("bbbaaaaaabbb"):
    dosomething()

Which of the two are more efficient & faster (considering the size of the string is huge)??

Is there any other option that is faster??

Thanks


回答1:


Option (1) definitely is faster. For the future, do something like this to test it:

>>> import time, re
>>> if True:
...     s = time.time()
...     "aaaa" in "bbbaaaaaabbb"
...     print time.time()-s
... 
True
1.78813934326e-05

>>> if True:
...     s = time.time()
...     pattern = re.compile("aaaa")
...     pattern.search("bbbaaaaaabbb")
...     print time.time()-s
... 
<_sre.SRE_Match object at 0xb74a91e0>
0.0143280029297

gnibbler's way of doing this is better, I never really played around with interpreter options so I didn't know about that one.




回答2:


Regex will be slower.

$ python -m timeit '"aaaa" in "bbbaaaaaabbb"'
10000000 loops, best of 3: 0.0767 usec per loop
$ python -m timeit -s 'import re; pattern = re.compile("aaaa")' 'pattern.search("bbbaaaaaabbb")'
1000000 loops, best of 3: 0.356 usec per loop



回答3:


I happen to have the E.coli genome at hand, so I tested the two options... Looking for "AAAA" in the E.coli genome 10,000,000 times (just to have decent times) with option (1) takes about 3.7 seconds. With option (2), of course with pattern = re.compile("AAAA") out of the loop, it took about 8.4 seconds. "dosomething()" in my case was adding 1 to an arbitrary variable. The E. coli genome I used is 4639675 nucleotides (letters) long.



来源:https://stackoverflow.com/questions/19911508/python-speed-for-in-vs-regular-expression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!