I have a list of possible substrings, e.g. [\'cat\', \'fish\', \'dog\']. In practice, the list contains hundreds of entries.
I\'m processing a string,
This is a vague, theoretical answer with no code provided, but I hope it can point you in the right direction.
First, you will need a more efficient lookup for your substring list. I would recommend some sort of tree structure. Start with a root, then add an 'a' node if any substrings start with 'a', add a 'b' node if any substrings start with 'b', and so on. For each of these nodes, keep adding subnodes.
For example, if you have a substring with the word "ant", you should have a root node, a child node 'a', a grandchild node 'n', and a great grandchild node 't'.
Nodes should be easy enough to make.
class Node(object):
children = []
def __init__(self, name):
self.name = name
where name is a character.
Iterate through your strings letter by letter. Keep track of which letter you're on. At each letter, try to use the next few letters to traverse the tree. If you're successful, your letter number will be the position of the substring, and your traversal order will indicate the substring that was found.
Clarifying edit: DFAs should be much faster than this method, and so I should endorse Tom's answer. I'm only keeping this answer up in case your substring list changes often, in which case using a tree might be faster.