Levenshtein Distance Algorithm better than O(n*m)?

时光怂恿深爱的人放手 提交于 2019-11-27 06:23:16

Are you interested in reducing the time complexity or the space complexity ? The average time complexity can be reduced O(n + d^2), where n is the length of the longer string and d is the edit distance. If you are only interested in the edit distance and not interested in reconstructing the edit sequence, you only need to keep the last two rows of the matrix in memory, so that will be order(n).

If you can afford to approximate, there are poly-logarithmic approximations.

For the O(n +d^2) algorithm look for Ukkonen's optimization or its enhancement Enhanced Ukkonen. The best approximation that I know of is this one by Andoni, Krauthgamer, Onak

If you only want the threshold function - eg, to test if the distance is under a certain threshold - you can reduce the time and space complexity by only calculating the n values either side of the main diagonal in the array. You can also use Levenshtein Automata to evaluate many words against a single base word in O(n) time - and the construction of the automatons can be done in O(m) time, too.

Look in Wiki - they have some ideas to improve this algorithm to better space complexity:

Wiki-Link: Levenshtein distance

Quoting:

We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time.

I found another optimization that claims to be O(max(m, n)):

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C

(the second C implementation)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!