Optimize speed of Levenshtein distance of many words

谁都会走 提交于 2019-12-06 15:31:05

Function 'strdist' is not an inbuilt matlab function, so I guess you took if from the File Exchange. That's also why both your approaches are roughly equal in time, cellfun internally just expands into a loop.

If strdist is symmetric, i.e. strdist(a,b)==strdist(b,a) you can however save half the computations. This seems to be the case, so only calculate all cases of j<i in the second loop (which you are doing).

Otherwise you could implement strdist in C as a mex function and probably see some significant speed improvements. A C implementation of the Levenshtein distance can be found for example at rosettacode.org.

Or dig into the details of how the algorithm computes the distance of two strings and see if you can vectorize it and reduce the runtime from quadratic so less. This however is probably not very easy.

Finally if you have the Parallel Computing Toolbox licensed and a multicore CPU you can easily parallelize your code since the strdist calls are completely independent of each other.

There are several much faster methods like Levenshtain Automata. See

  1. http://en.wikipedia.org/wiki/Levenshtein_automaton
  2. http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata
  3. https://www.google.com.ng/search?q=Fast+approximate+search+in+large+dictionaries (many different papers. You can also go by (reverse) references from papers on CiteSeerX.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!