Google Refine recipe for reconciling messy entities in two databases

偶尔善良 提交于 2019-12-05 02:55:57

问题


I have two databases of messy names such as these:

  • Jindal, Bobby
  • Fla. Gov. Bobby Jindal
  • Bobby Jindal
  • 3M Corp.
  • 3M Menomonie

I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine?

This link gives me a starting point but I could use further advice: http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/


回答1:


You could try our Refine extension, see especially the reconciliation part of the doc.




回答2:


cell.cross function is similar to the vlookup in Excel, it will match only if your two cells are identical. If you want to use this method you will need to cluster and clean your data a lot before.

I support Michael answer. Try a reconciliation service: rdf one or the open reconcile.



来源:https://stackoverflow.com/questions/10472601/google-refine-recipe-for-reconciling-messy-entities-in-two-databases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!