Match 2 lists of strings by ressemblance

为君一笑 提交于 2019-12-05 22:18:08

Once you have established the metric you want to use to keep track of the "distance" between two strings, be it the Levenshtein distance or another one, you can use the Hungarian algorithm to solve your problem.

I personally have never implement it, but Wikipedia includes several links that might be of help.

My suggestion for a possible optimization to this:

I calculate the Levenshtein distance for each possible pair of string and store the results in a 2-dimension array.

Is that you can avoid computing the distance for every possible pair of the string by considering their lengths. Because let's say:

1. if the pair is e.g. "ab", and "cdefg"
2. and you know that there's another string that has similar length with "ab" e.g. "xy"

Then you shouldn't need to calculate the distance between "ab" and "cdefg". Because the minimum distance you can get between strings of those lengths is 3, whereas the maximum distance between two strings of equal lengths ("ab" and "xy" as in the example) will be 2.

You can do this by using a smarter data structure that keeps track of length of strings e.g. unordered_map<int, vector<string> > in C++0x or tr1 C++.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!