Levenshtein distance: how to better handle words swapping positions?

前端 未结 9 1416
忘掉有多难
忘掉有多难 2021-01-30 02:22

I\'ve had some success comparing strings using the PHP levenshtein function.

However, for two strings which contain substrings that have swapped positions, the algorithm

9条回答
  •  半阙折子戏
    2021-01-30 03:20

    i believe this is a prime example for using a vector-space search engine.

    in this technique, each document essentially becomes a vector with as many dimensions as there are different words in the entire corpus; similar documents then occupy neighboring areas in that vector space. one nice property of this model is that queries are also just documents: to answer a query, you simply calculate their position in vector space, and your results are the closest documents you can find. i am sure there are get-and-go solutions for PHP out there.

    to fuzzify results from vector space, you could consider to do stemming / similar natural language processing technique, and use levenshtein to construct secondary queries for similar words that occur in your overall vocabulary.

提交回复
热议问题