String matching performance: gcc versus CPython

谁都会走 提交于 2019-11-30 18:08:50

The python 3.4 code b'abc' in b'abcabc' (or b'abcabc'.__contains__(b'abc') as in your example) executes the bytes_contains method, which in turn calls the inlined function stringlib_find; which delegates the search to FASTSEARCH.

The FASTSEARCH function then uses a simplified Boyer-Moore search algorithm (Boyer-Moore-Horspool):

fast search/count implementation, based on a mix between boyer- moore and horspool, with a few more bells and whistles on the top. for some more background, see: http://effbot.org/zone/stringlib.htm

There are some modifications too, as noted by the comments:

note: fastsearch may access s[n], which isn't a problem when using Python's ordinary string types, but may cause problems if you're using this code in other contexts. also, the count mode returns -1 if there cannot possible be a match in the target string, and 0 if it has actually checked for matches, but didn't find any. callers beware!


The GNU C++ Standard Library basic_string<T>::find() implementation is as generic (and dumb) as possible; it just tries dumbly matching the pattern at each and every consecutive character position until it finds the match.


TL;DR: The reason why the C++ standard library is so slow compared to Python is because it tries to do a generic algorithm on top of std::basic_string<char>, but fails to do it efficiently for the more interesting cases; whereas in Python the programmer gets the most efficient algorithms on case-by-case basis for free.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!