问题
I have set (not list) of strings (words). It is a big one. (It's ripped out of images with openCV and tesseract so there's no reliable way to predict its contents.)
At some point of working with this list I need to find out if it contains at least one word that begins with part I'm currently processing. So it's like (NOT an actual code):
if exists(word.startswith(word_part) in word_set) then continue else break
There is a very good answer on how to find all strings in list that start with something here:
result = [s for s in string_list if s.startswith(lookup)]
or
result = filter(lambda s: s.startswith(lookup), string_list)
But they return list
or iterator
of all strings found.
I only need to find if any such string exists within set, not get them all.
Performance-wise it seems kinda stupid to get list, then get its len
and see if it's more than zero and then just drop that list.
It there a better / faster / cleaner way?
回答1:
Your pseudocode is very close to real code!
if any(word.startswith(word_part) for word in word_set):
continue
else:
break
any returns as soon as it finds one true element, so it's efficient.
回答2:
You need yield:
def find_word(word_set, letter):
for word in word_set:
if word.startswith(letter):
yield word
yield None
if next(find_word(word_set, letter)): print('word exists')
Yield gives out words lazily. So if you call it once, it will give out only one word.
来源:https://stackoverflow.com/questions/59385512/fast-way-to-find-if-list-of-words-contains-at-least-one-word-that-starts-with-ce