Levenstein distance limit

问题

If I have some distance which I do not want to exceed. Example = 2. Do I can break from algoritm before its complete completion knowing the minimum allowable distance?

Perhaps there are similar algorithms in which it can be done.

It is necessary for me to reduce the time of work programs.

回答1:

If you do top-down dynamic programming/recursion + memoization, you could pass the current size as an extra parameter and return early if it exceeds 2. But I think this will be inefficient because you will revisit states.

If you do bottom-up dp, you will fill row by row (you only have to keep the last and current row). If the last row only has entries greater than 2, you can terminate early.

Modify your source code according to my comment:

for (var i = 1; i <= source1Length; i++)
{
                for (var j = 1; j <= source2Length; j++)
                {
                    var cost = (source2[j - 1] == source1[i - 1]) ? 0 : 1;

                    matrix[i, j] = Math.Min(
                        Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1),
                        matrix[i - 1, j - 1] + cost);
                }
                // modify here:
                // check here if matrix[i,...] is completely > 2, if yes, break

}

回答2:

Yes you can and it does reduce the complexity.

The main thing to observe is that levenstein_distance(a,b) >= |len(a) - len(b)| That is the distance can't be less than the difference in the lengths of the strings. At the very minimum you need to add characters to make them the same length.

Knowing this you can ignore all the cells in the original matrix where |i-j| > max_distance. So you can modify your loops from

for (i in 0 -> len(a))
   for (j in 0 -> len(b))

for (i in 0-> len(a))
   for (j in max(0,i-max_distance) -> min(len(b), i+max_distance))

You can keep the original matrix if it's easier for you, but you can also save space by having a matrix of (len(a), 2*max_distance) and adjusting the indices.

Once every cost you have in the last row > max_distance you can stop the algorithm.

This will give you O(N*max_distance) complexity. Since your max_distance is 2 the complexity is almost linear. You can also bail at the start is |len(a)-len(b)| > max_distance.

来源：https://stackoverflow.com/questions/48901351/levenstein-distance-limit

标签

algorithm

levenshtein-distance