How to compute multiple related Levenshtein distances?

こ雲淡風輕ζ 提交于 2021-02-08 11:33:40

问题


Given two strings of equal length, Levenshtein distance allows to find the minimum number of transformations necessary to get the second string, given the first. However, I'd like to find a way to adjust the alogrithm for multiple pairs of strings, given that they were all generated in the same way.


回答1:


Reading the comments, it appears that this is the problem:

You are given a set of pairs of strings, all the same length and each pair is the input to some function paired with the output from the function. So, for the pair A,B, we know that f(A)=B. The goal is to reverse engineer f() with a large set of A,B pairs.

Using Levenshtein distance on the entire set will, at most, tell you the maximum number of transformations that must take place.

A better start would be Hamming distance (modified to allow multiple characters) or Jaccard similarity to identify how many positions in strings do not change at all for all of the pairs. Then, you are left only with those that do change.

This will fail if the letters shift.

To detect shift, you want to use global alignment (Needleman-Wunsch). You will then see something like "ABCDE"=>"xABCD" to show that from the input to the output, there was a left shift.

Overall, I feel that Levenshtein distance will do very little to help you get at the original algorithm.



来源:https://stackoverflow.com/questions/4809525/how-to-compute-multiple-related-levenshtein-distances

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!