Modifying Levenshtein Distance for positional Bias

独自空忆成欢 提交于 2019-12-31 03:10:09

问题


I am using the Levenshtein distance algorithm to compare a company name provided as a user input against a database of known company names to find closest match. By itself, the algorithm works okay, but I want to build in a Bias so that the edit distance is considered lower if the initial parts of the strings match.

For Example, if the search criteria is "ABCD", then both "ABCD Co." and "XYX ABCD" have identical Edit Distance. However I want to add weight to the fact that the initial parts of the first string matches the search criteria more closely than the second string.

One way of doing this might be to modify the insert/delete/replace costs to be higher at the beginning of the strings and lower towards the end. Does anyone have an example of a successful implementation of this? Is using Levenshtein distance still the best way to do what I am trying to achieve? Is my assumption of the approach accurate?

UPDATE: For my immediate purposes I have decided to forgo the above and instead use the Jaro Winkler edit distance which seems to solve the problem. However I will leave this open for further inputs.


回答1:


What you're looking for looks like a Smith-Waterman local alignment: http://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm



来源:https://stackoverflow.com/questions/10425238/modifying-levenshtein-distance-for-positional-bias

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!