Is there a faster (less precise) algorithm than Levenshtein for string distance?

こ雲淡風輕ζ 提交于 2019-11-28 01:17:51

问题


I want to run the Levenshtein, but WAY faster because it's real time application that I'm building. It can terminate once the distance is greater than 10.


回答1:


The Levenshtein distance metric allows addition, deletion or substitution operations. If you're looking for a faster but less precise metric you can use the longest common subsequence (allows only addition and deletion) or even the Hamming distance (allows only substitution).

However, I recommend that you try to optimize your Levenshtein distance algorithm instead as it gives the best results.




回答2:


Judging from comments, people seem to be pretty happy with Sift3.

http://sift.codeplex.com




回答3:


If you want to compare UTF-8 contents use sift4:

http://siderite.blogspot.com/2014/11/super-fast-and-accurate-string-distance.html

Also I prepared a jsPerf which shows the performance difference between those libraries: http://jsperf.com/levenshtein-perf



来源:https://stackoverflow.com/questions/6178708/is-there-a-faster-less-precise-algorithm-than-levenshtein-for-string-distance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!