问题
What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?
回答1:
I suggest you read the articles by Navarro mentioned in the Refences section of the Wikipedia article titled Approximate string matching. Making your decision based on actual research is always better than on suggestions by random strangers.. Especially if performance on a known set of records is important to you.
回答2:
It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.
来源:https://stackoverflow.com/questions/491148/best-fuzzy-matching-algorithm