Fast way to find if list of words contains at least one word that starts with certain letters (not “find ALL words”!)

懵懂的女人 提交于 2020-01-15 04:03:56

问题


I have set (not list) of strings (words). It is a big one. (It's ripped out of images with openCV and tesseract so there's no reliable way to predict its contents.)

At some point of working with this list I need to find out if it contains at least one word that begins with part I'm currently processing. So it's like (NOT an actual code):

if exists(word.startswith(word_part) in word_set) then continue else break

There is a very good answer on how to find all strings in list that start with something here:

result = [s for s in string_list if s.startswith(lookup)]

or

result = filter(lambda s: s.startswith(lookup), string_list)

But they return list or iterator of all strings found. I only need to find if any such string exists within set, not get them all. Performance-wise it seems kinda stupid to get list, then get its len and see if it's more than zero and then just drop that list.

It there a better / faster / cleaner way?


回答1:


Your pseudocode is very close to real code!

if any(word.startswith(word_part) for word in word_set):
    continue
else:
    break

any returns as soon as it finds one true element, so it's efficient.




回答2:


You need yield:

def find_word(word_set, letter):
    for word in word_set:
        if word.startswith(letter):
            yield word
    yield None
if next(find_word(word_set, letter)): print('word exists')

Yield gives out words lazily. So if you call it once, it will give out only one word.



来源:https://stackoverflow.com/questions/59385512/fast-way-to-find-if-list-of-words-contains-at-least-one-word-that-starts-with-ce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!