PHP/MySQL small-scale fuzzy search

风流意气都作罢 提交于 2019-11-27 14:13:33

问题


I'm looking to implement fuzzy search for a small PHP/MySQL application. Specifically, I have a database with about 2400 records (records added at a rate of about 600 per year, so it's a small database). The three fields of interest are street address, last name and date. I want to be able to search by one of those fields, and essentially have tolerance for spelling/character errors. i.e., an address of "123 Main Street" should also match "123 Main St", "123 Main St.", "123 Mian St", "123 Man St", "132 Main St", etc. and likewise for name and date.

The main issues I have with answers to other similar questions:

  • It's impossible to define synonyms for every possible incorrect spelling, forget doing so for dates and names.
  • Lucene, etc. seems very heavy-weight for such a limited search data set (call it a maximum of 5,000 records, 3 fields per record).
  • Just doing something with wildcards doesn't seem logical with all of the possible spelling errors.

Any suggestions? I know it isn't going to be possible to do natively with MySQL, but since the data set is so limited, I'd like to keep it relatively simple... perhaps a PHP class that gets all of the records from the DB, uses some sort of comparison algorithm, and returns the IDs of the similar records?

Thanks, Jason


回答1:


Razzie's answer (or using Damerau–Levenshtein) ranks a list of candidates matches according to their closeness to the search key. (Take care: if the key is "12 Main St" then "13 Main St" has the same typing distance as "12 Moin St" but you might want to rank it low or even exclude it, as with 11 and 22 Main St etc.)

But how do you select a list of candidates of a manageable size to rank?

One way is to compute the metaphone value (or values, using double-metaphone) for each word in the strings your going to search. Save each of these metaphones in another table with the id of the row containing the original string. You can then search these metaphone values quickly with LIKE 'key%' where key is the metaphone of a word from the search text.

Check out the suggested answer on this thread. It's quite neat and should work nicely for DBs that aren't huge.




回答2:


If it is a very small database, you could load all the data at once and use an algorithm like Jaro-Winkler for your search. They have an implementation in PHP, which you can find here.

Imho it works really well. Take a look at an example implementation here. I know that that search uses the same algorithm, and it can find 'Nintedno' very well. It also sorts the results for you, based on which result best matches your query.



来源:https://stackoverflow.com/questions/1914671/php-mysql-small-scale-fuzzy-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!