Algorithm to find edit distance to all substrings

前端 未结 2 2034
不思量自难忘° 2021-02-20 15:05

Given 2 strings s and t. I need to find for each substring in s edit distance(Levenshtein distance) to t. Actually I need to

  •  不思量自难忘°
    2021-02-20 15:25

    To find substrings in a given string is very easy. You take the normal Levenshtein algorithm and modify it slightly.

    FIRST: Instead of filling the first row of the matrix with 0,1,2,3,4,5,... you fill it entirely with zeros. (green rectangle)

    SECOND: Then you run the algorithm.

    THIRD: Instead of returning the last cell of the last row you search for the smallest value in the last row and return it. (red rectangle)

    Example: needle: "aba", haystack: "c abba c" --> result = 1 (converting abba -> aba)

    I tested it and it works.

    This is much faster than your suggestion of stepping character by character through the string as you do in your question. You only create the matrix once.
