发表新帖

发表新帖

What's the most efficient way to find one of several substrings in Python?

前端未结

关注

 6  2296

萌比男神i 2020-12-05 04:19

I have a list of possible substrings, e.g. [\'cat\', \'fish\', \'dog\']. In practice, the list contains hundreds of entries.

I\'m processing a string,

6条回答

误落风尘 (楼主)

2020-12-05 05:08
This is a vague, theoretical answer with no code provided, but I hope it can point you in the right direction.

First, you will need a more efficient lookup for your substring list. I would recommend some sort of tree structure. Start with a root, then add an 'a' node if any substrings start with 'a', add a 'b' node if any substrings start with 'b', and so on. For each of these nodes, keep adding subnodes.

For example, if you have a substring with the word "ant", you should have a root node, a child node 'a', a grandchild node 'n', and a great grandchild node 't'.

Nodes should be easy enough to make.
```
class Node(object):
    children = []

    def __init__(self, name):
        self.name = name
```
where name is a character.

Iterate through your strings letter by letter. Keep track of which letter you're on. At each letter, try to use the next few letters to traverse the tree. If you're successful, your letter number will be the position of the substring, and your traversal order will indicate the substring that was found.

Clarifying edit: DFAs should be much faster than this method, and so I should endorse Tom's answer. I'm only keeping this answer up in case your substring list changes often, in which case using a tree might be faster.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题