I have a list with many words (100.000+), and what I\'d like to do is remove all the substrings of every word in the list.
So for simplicity, let\'s imagine that I have
Build the set of all (unique) substrings first, then filter the words with it:
def substrings(s):
length = len(s)
return {s[i:j + 1] for i in range(length) for j in range(i, length)} - {s}
def remove_substrings(words):
subs = set()
for word in words:
subs |= substrings(word)
return set(w for w in words if w not in subs)