Creating a “spell check” that checks against a database with a reasonable runtime

后端 未结 6 1721
Happy的楠姐
Happy的楠姐 2021-01-31 05:22

I\'m not asking about implementing the spell check algorithm itself. I have a database that contains hundreds of thousands of records. What I am looking to do is checking a user

6条回答
  •  Happy的楠姐
    2021-01-31 05:54

    As Darcara said, a BK-Tree is a good first take. They are very easy to implement. There are several free implementations easily found via Google, but a better introduction to the algorithm can be found here: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees.

    Unfortunately, calculating the Levenshtein distance is pretty costly, and you'll be doing it a lot if you're using a BK-Tree with a large dictionary. For better performance, you might consider Levenshtein Automata. A bit harder to implement, but also more efficient, and they can be used to solve your problem. The same awesome blogger has the details: http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata. This paper might also be interesting: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652.

提交回复
热议问题