What's the most efficient way to find one of several substrings in Python?

前端 未结 6 2294
萌比男神i
萌比男神i 2020-12-05 04:19

I have a list of possible substrings, e.g. [\'cat\', \'fish\', \'dog\']. In practice, the list contains hundreds of entries.

I\'m processing a string,

6条回答
  •  星月不相逢
    2020-12-05 04:57

    I just want to point out the time difference between DisplacedAussie's answer and Tom's answer. Both were fast when used once, so you shouldn't have any noticeable wait for either, but when you time them:

    import random
    import re
    import string
    
    words = []
    letters_and_digits = "%s%s" % (string.letters, string.digits)
    for i in range(2000):
        chars = []
        for j in range(10):
            chars.append(random.choice(letters_and_digits))
        words.append(("%s"*10) % tuple(chars))
    search_for = re.compile("|".join(words))
    first, middle, last = words[0], words[len(words) / 2], words[-1]
    search_string = "%s, %s, %s" % (last, middle, first)
    
    def _search():
        match_obj = search_for.search(search_string)
        # Note, if no match, match_obj is None
        if match_obj is not None:
             return (match_obj.start(), match_obj.group())
    
    def _map():
        search_for = search_for.pattern.split("|")
        found = map(lambda x: (search_string.index(x), x), filter(lambda x: x in search_string, search_for))
        if found:
            return min(found, key=lambda x: x[0])
    
    
    if __name__ == '__main__':
        from timeit import Timer
    
    
        t = Timer("_search(search_for, search_string)", "from __main__ import _search, search_for, search_string")
        print _search(search_for, search_string)
        print t.timeit()
    
        t = Timer("_map(search_for, search_string)", "from __main__ import _map, search_for, search_string")
        print _map(search_for, search_string)
        print t.timeit()
    

    Outputs:

    (0, '841EzpjttV')
    14.3660159111
    (0, '841EzpjttV')
    # I couldn't wait this long
    

    I would go with Tom's answer, for both readability, and speed.

提交回复
热议问题