Percentage rank of matches using Levenshtein Distance matching

后端未结

关注

 6  1850

既然无缘 2020-12-14 02:19

I am trying to match a single search term against a dictionary of possible matches using a Levenshtein distance algorithm. The algorithm returns a distance expressed as numb

6条回答

执念已碎 (楼主)

2020-12-14 02:42
My approach to this problem was by calculating maximum allowed operations, which is what Levenshtein distance is. The formula I used is:
```
percent = 0.75; // at least 75% of string must match
maxOperationsFirst = s1.length() - s1.length() * percent;
maxOperationsSecond = s2.length() - s2.length() * percent;
maxOperations = round(min(maxOperationsFirst, maxOperationsSecond));
```
It calculates maximum operations for each string, I believe that the calculation is easy to understand. I use the minimum value of both results and round it to closest whole number. You can skip this part and use just value of max operations from either of strings, it really depends on your data.

Once you've got the number of maximum operations, you can compare it with levenshtein result and determine if the string is acceptable. This way you can use any extended levenshtein methods, for example Damerau–Levenshtein distance, which count misspelling, e.g. test -> tset, only as 1 operation, which is quite useful when checking user input where those misspellings occur very often.

I hope this helps you get an idea on how to solve this problem.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...