Algorithm wanted: Find all words of a dictionary that are similar to words in a free text

后端 未结 4 457
迷失自我
迷失自我 2020-12-24 09:39

We have a list of about 150,000 words, and when the user enters a free text, the system should present a list of words from the dictionary, that are very close to words in t

4条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-24 10:15

    You would likely want to use an algorithm which calculates the Levenshtein distance.

    However, since your data set is quite large, and you'll be comparing lots of words against it, a direct implementation of typical algorithms that do this won't be practical.

    In order to find words in a reasonable amount of time, you will have to index your set of words in some way that facilitates fuzzy string matching.

    One of these indexing methods would be to use a suffix tree. Another approach would be to use n-grams.

    I would lean towards using a suffix tree since I find it easier to wrap my head around it and I find it more suited to the problem.

提交回复
热议问题