Edit distance recursive algorithm — Skiena

后端 未结 6 1781
执笔经年
执笔经年 2020-12-30 02:37

I\'m reading The Algorithm Design Manual by Steven Skiena, and I\'m on the dynamic programming chapter. He has some example code for edit distance and uses some functions w

6条回答
  •  滥情空心
    2020-12-30 02:48

    Basically, it utilizes the dynamic programming method of solving problems where the solution to the problem is constructed to solutions to subproblems, to avoid recomputation, either bottom-up or top-down.

    The recursive structure of the problem is as given here, where i,j are start (or end) indices in the two strings respectively.

    enter image description here

    Here's an excerpt from this page that explains the algorithm well.

    Problem: Given two strings of size m, n and set of operations replace (R), insert (I) and delete (D) all at equal cost. Find minimum number of edits (operations) required to convert one string into another.

    Identifying Recursive Methods:

    What will be sub-problem in this case? Consider finding edit distance of part of the strings, say small prefix. Let us denote them as [1...i] and [1...j] for some 1< i < m and 1 < j < n. Clearly it is solving smaller instance of final problem, denote it as E(i, j). Our goal is finding E(m, n) and minimizing the cost.

    In the prefix, we can right align the strings in three ways (i, -), (-, j) and (i, j). The hyphen symbol (-) representing no character. An example can make it more clear.

    Given strings SUNDAY and SATURDAY. We want to convert SUNDAY into SATURDAY with minimum edits. Let us pick i = 2 and j = 4 i.e. prefix strings are SUN and SATU respectively (assume the strings indices start at 1). The right most characters can be aligned in three different ways.

    Case 1: Align characters U and U. They are equal, no edit is required. We still left with the problem of i = 1 and j = 3, E(i-1, j-1).

    Case 2: Align right character from first string and no character from second string. We need a deletion (D) here. We still left with problem of i = 1 and j = 4, E(i-1, j).

    Case 3: Align right character from second string and no character from first string. We need an insertion (I) here. We still left with problem of i = 2 and j = 3, E(i, j-1).

    Combining all the subproblems minimum cost of aligning prefix strings ending at i and j given by

    E(i, j) = min( [E(i-1, j) + D], [E(i, j-1) + I], [E(i-1, j-1) + R if i,j characters are not same] )

    We still not yet done. What will be base case(s)?

    When both of the strings are of size 0, the cost is 0. When only one of the string is zero, we need edit operations as that of non-zero length string. Mathematically,

    E(0, 0) = 0, E(i, 0) = i, E(0, j) = j

    I recommend going through this lecture for a good explanation.

    The function match() returns 1, if the two characters mismatch (so that one more move is added in the final answer) otherwise 0.

提交回复
热议问题