Is Approximate String Matching / Fuzzy String Searching possible with BigQuery?

谁说我不能喝 提交于 2021-01-27 15:00:52

问题


Thanks to Google for delivering BigQuery, it's great!
Is Approximate String Matching / Fuzzy String Searching possible with BigQuery?
Does Google have plans to add this functionality to BigQuery?

Surely the Google proprietary Approximate String Matching algorithm could be used to deliver this capability to BigQuery while still maintaining Google Intellectual Property. We've searched all the BigQuery documentation and Stack Overflow questions. Of course there are many algorithms to do this, though how to integrate with BigQuery?

Our need is simple, to compare two strings which will be mostly the same though could be slightly different. For example:

"Rhodes USA" vs. "Rhodes USA, LLC", vs. "Rhodes USA LLC".  

From our BigQuery tests it appears two strings need to match EXACTLY for BigQuery to JOIN them, even down to the number of trailing spaces in each string. The addition of this functionality or guidance for integration with BigQuery would be greatly appreciated. This is in support of Milwaukee Jets, a regional, innovative, fractional jet ownership company in Milwaukee, WI. Thanks again Google for delivering BigQuery.

Thank you very much and best regards, Andrew Paullin (414) 212-5372


回答1:


Unfortunately, approximate string matching is not supported. The closest you can get is by using regular expressions. Your best bet may be to normalize the data before it gets to BigQuery -- i.e transform "Rhodes USA" and "Rhodes, USA. " into the same string. I'll add a feature request bug for this support, however.



来源:https://stackoverflow.com/questions/10546130/is-approximate-string-matching-fuzzy-string-searching-possible-with-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!