Modifying Levenshtein Distance algorithm to not calculate all distances

后端 未结 6 1779
渐次进展
渐次进展 2020-12-31 19:40

I\'m working on a fuzzy search implementation and as part of the implementation, we\'re using Apache\'s StringUtils.getLevenshteinDistance. At the moment, we\'re going for a

6条回答
  •  长发绾君心
    2020-12-31 20:15

    Here someone answers a very similar question:

    Cite:
    I've done it a number of times. The way I do it is with a recursive depth-first tree-walk of the game tree of possible changes. There is a budget k of changes, that I use to prune the tree. With that routine in hand, first I run it with k=0, then k=1, then k=2 until I either get a hit or I don't want to go any higher.

    char* a = /* string 1 */;
    char* b = /* string 2 */;
    int na = strlen(a);
    int nb = strlen(b);
    bool walk(int ia, int ib, int k){
      /* if the budget is exhausted, prune the search */
      if (k < 0) return false;
      /* if at end of both strings we have a match */ 
      if (ia == na && ib == nb) return true;
      /* if the first characters match, continue walking with no reduction in budget */
      if (ia < na && ib < nb && a[ia] == b[ib] && walk(ia+1, ib+1, k)) return true;
      /* if the first characters don't match, assume there is a 1-character replacement */
      if (ia < na && ib < nb && a[ia] != b[ib] && walk(ia+1, ib+1, k-1)) return true;
      /* try assuming there is an extra character in a */
      if (ia < na && walk(ia+1, ib, k-1)) return true;
      /* try assuming there is an extra character in b */
      if (ib < nb && walk(ia, ib+1, k-1)) return true;
      /* if none of those worked, I give up */
      return false;
    }  
    

    just the main part, more code in the original

提交回复
热议问题