发表新帖

发表新帖

Levenshtein distance: how to better handle words swapping positions?

前端未结

关注

 9  1489

忘掉有多难 2021-01-30 02:22

I\'ve had some success comparing strings using the PHP levenshtein function.

However, for two strings which contain substrings that have swapped positions, the algorithm

9条回答

半阙折子戏 (楼主)

2021-01-30 03:20

i believe this is a prime example for using a vector-space search engine.

in this technique, each document essentially becomes a vector with as many dimensions as there are different words in the entire corpus; similar documents then occupy neighboring areas in that vector space. one nice property of this model is that queries are also just documents: to answer a query, you simply calculate their position in vector space, and your results are the closest documents you can find. i am sure there are get-and-go solutions for PHP out there.

to fuzzify results from vector space, you could consider to do stemming / similar natural language processing technique, and use levenshtein to construct secondary queries for similar words that occur in your overall vocabulary.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题