Remove substrings inside a list with better than O(n^2) complexity

后端 未结 4 1552
清酒与你
清酒与你 2021-02-02 18:21

I have a list with many words (100.000+), and what I\'d like to do is remove all the substrings of every word in the list.

So for simplicity, let\'s imagine that I have

4条回答
  •  我在风中等你
    2021-02-02 18:39

    Build the set of all (unique) substrings first, then filter the words with it:

    def substrings(s):
        length = len(s)
        return {s[i:j + 1] for i in range(length) for j in range(i, length)} - {s}
    
    
    def remove_substrings(words):
        subs = set()
        for word in words:
            subs |= substrings(word)
    
        return set(w for w in words if w not in subs)
    

提交回复
热议问题