Algorithm wanted: Find all words of a dictionary that are similar to words in a free text

后端未结

关注

 4  457

迷失自我 2020-12-24 09:39

We have a list of about 150,000 words, and when the user enters a free text, the system should present a list of words from the dictionary, that are very close to words in t

4条回答

野趣味 (楼主)

2020-12-24 10:15

You would likely want to use an algorithm which calculates the Levenshtein distance.

However, since your data set is quite large, and you'll be comparing lots of words against it, a direct implementation of typical algorithms that do this won't be practical.

In order to find words in a reasonable amount of time, you will have to index your set of words in some way that facilitates fuzzy string matching.

One of these indexing methods would be to use a suffix tree. Another approach would be to use n-grams.

I would lean towards using a suffix tree since I find it easier to wrap my head around it and I find it more suited to the problem.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...