问题
I have a database table full of addresses from Google Maps geocode responses. Google abbreviates all directions (West -> W, East -> E, etc).
So if I enter an address like "100 West Pender Street" then the formatted address returned by Google Maps is "100 W Pender St" which I insert into my table.
Now if a user comes along and searches for that address, all of the following should match:
pender street west pender street 100 pender 100 w pender 100 west pender
and they more or less do. the "w" in the table is ignored however because it falls below the minimum word length. addresses falling on east penner are given equal weighting in the search results (the "E" is also ignored).
What's the best way to handle this?
I suspect setting the minimum word length to 1 is a "bad thing".
I could do a search and replace against the known abbreviations (N, E, S, W, St, Ave, Dr, etc) in the google addresses and replace them with their expansions -- but there are some street names where this is not valid (some cities have single letter street names: J Street, etc...)
Also addresses like "123 160 St" are not searchable at all because the street number (123) and street name (160) both fall below the minimum word length.
Is MySQL FullText the right approach for this? Does Sphinx offer something better?
Or is there another solution I haven't considered yet? Keep in mind that the user's search query will be matched not only against the property's address but also against other text columns such as the property name and description.
回答1:
This is actually an incredibly difficult problem -- if you're on your own. I work in the address verification industry at a company called SmartyStreets, where our products perform the task you describe. It's a complicated sequence of operations that match address searches to valid, even deliverable, endpoints. The accreditation of performing address lookups accurately, correctly, and completely, is called CASS Certification.
The difference between Google's results and CASS-Certified results is that Google's algorithms are "best-guess". This is what Google is good at... unfortunately, that goes for addresses that aren't perfectly valid, too. (See: http://answers.smartystreets.com/questions/269/why-did-the-address-fail-validation-it-looks-good-to-me)
Fuzzy lookups with MySQL will yield results, and your code can have algorithms to help, but there's no guarantee of accuracy or validity, or in that case, even any worth.
I don't think you'll want your users to get wrong addresses in return to their query. It makes your service appear sub-par and the users won't get the value they expect (right?) ... I suggest you find a vendor of CASS software. You can Google "address verification" for example -- the best, web-based solution I can recommend is SmartyStreets' LiveAddress API.
来源:https://stackoverflow.com/questions/7958267/fuzzy-street-address-searches-using-mysql-fulltext-or-sphinx