most efficient way to find partial string matches in large file of strings (python)

前端 未结 3 664
眼角桃花
眼角桃花 2021-01-12 19:46

I downloaded the Wikipedia article titles file which contains the name of every Wikipedia article. I need to search for all the article titles that may be a possible match.

3条回答
  •  长情又很酷
    2021-01-12 20:05

    If you've got a fixed data set and variable queries, then the usual technique is to reorganise the data set into something that can be searched more easily. At an abstract level, you could break up each article title into individual lowercase words, and add each of them to a Python dictionary data structure. Then, whenever you get a query, convert the query word to lower case and look it up in the dictionary. If each dictionary entry value is a list of titles, then you can easily find all the titles that match a given query word.

    This works for straightforward words, but you will have to consider whether you want to do matching on similar words, such as finding "smoking" when the query is "smoke".

提交回复
热议问题