What's the most efficient way to find one of several substrings in Python?

前端 未结 6 2296
萌比男神i
萌比男神i 2020-12-05 04:19

I have a list of possible substrings, e.g. [\'cat\', \'fish\', \'dog\']. In practice, the list contains hundreds of entries.

I\'m processing a string,

6条回答
  •  误落风尘
    2020-12-05 05:08

    This is a vague, theoretical answer with no code provided, but I hope it can point you in the right direction.

    First, you will need a more efficient lookup for your substring list. I would recommend some sort of tree structure. Start with a root, then add an 'a' node if any substrings start with 'a', add a 'b' node if any substrings start with 'b', and so on. For each of these nodes, keep adding subnodes.

    For example, if you have a substring with the word "ant", you should have a root node, a child node 'a', a grandchild node 'n', and a great grandchild node 't'.

    Nodes should be easy enough to make.

    class Node(object):
        children = []
    
        def __init__(self, name):
            self.name = name
    

    where name is a character.

    Iterate through your strings letter by letter. Keep track of which letter you're on. At each letter, try to use the next few letters to traverse the tree. If you're successful, your letter number will be the position of the substring, and your traversal order will indicate the substring that was found.

    Clarifying edit: DFAs should be much faster than this method, and so I should endorse Tom's answer. I'm only keeping this answer up in case your substring list changes often, in which case using a tree might be faster.

提交回复
热议问题