I have 5000, sometimes more, street address strings in an array. I\'d like to compare them all with levenshtein to find similar matches. How can I do this without looping throug
You can use a bk-tree to speed-up the search/comparison.
http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees says:
Now we can make a particularly useful observation about the Levenshtein Distance: It forms a Metric Space.
[...]
Assume for a moment we have two parameters, query, the string we are using in our search, and n the maximum distance a string can be from query and still be returned. Say we take an arbitary string, test and compare it to query. Call the resultant distance d. Because we know the triangle inequality holds, all our results must have at most distance d+n and at least distance d-n from test.
[...]
Tests show that searching with a distance of 1 queries no more than 5-8% of the tree, and searching with two errors queries no more than 17-25% of the tree - a substantial improvement over checking every node!
edit: But that doesn't help you with your ("12 Bird Road, Apt 6" and "12 Bird Rd. #6") problem. Only with your brute-force m*n comparison.