google-refine

Script-driven automation of Google refine with ruby python perl java or otherwise

限于喜欢 提交于 2019-12-12 13:30:59
问题 BACKGROUND: Co-worker Adam has been using Google refine to process database downloads with much success over the last year or so, but Adam got a new job offer and consequently all of his work and expertise he has done in Google refine is going away. Ben would like to have Adam package all of his work that she has done with Google refine so that the users in the office can still benefit from his work, without having to know how to use Google refine itself. (i.e., run it as part of a batch

Can I call external *python* functions from google refine?

南笙酒味 提交于 2019-12-05 08:54:27
问题 I'm investigating Google refine to speed up some of my data work -- never used it before this week, but I like a lot of what I see. My biggest question so far is whether it's possible to call external python functions from Refine. I know you can call jython internally, but that doesn't provide access to C-based python libraries (e.g. lxml), and I have scripts elsewhere that I'd like to integrate, without lots of copy-paste or rewrite hassle. What options are there for doing this in Refine? I

Google Refine recipe for reconciling messy entities in two databases

偶尔善良 提交于 2019-12-05 02:55:57
问题 I have two databases of messy names such as these: Jindal, Bobby Fla. Gov. Bobby Jindal Bobby Jindal 3M Corp. 3M Menomonie I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine? This link gives me a starting point but I could use further advice: http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/ 回答1: You could try our Refine extension, see especially the reconciliation part of the doc. 回答2: cell

Google Refine recipe for reconciling messy entities in two databases

廉价感情. 提交于 2019-12-03 17:37:49
I have two databases of messy names such as these: Jindal, Bobby Fla. Gov. Bobby Jindal Bobby Jindal 3M Corp. 3M Menomonie I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine? This link gives me a starting point but I could use further advice: http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/ You could try our Refine extension , see especially the reconciliation part of the doc. cell.cross function is similar to the vlookup in Excel, it will match only if your two cells are identical. If you

Parse JSON in Google Refine

萝らか妹 提交于 2019-11-29 04:25:57
I'm trying to pull out specific elements from results from the Data Science Toolkit coordinates2politics API, using Google Refine. Here is sample cell #1: [{"politics":[ {"type":"admin2","friendly_type":"country","code":"usa","name":"United States"}, {"type":"admin6","friendly_type":"county","code":"55_025","name":"Dane"}, {"type":"constituency","friendly_type":"constituency","code":"55_02","name":"Second district, WI"}, {"type":"admin5","friendly_type":"city","code":"55_48000","name":"Madison"}, {"type":"admin5","friendly_type":"city","code":"55_53675","name":"Monona"}, {"type":"admin4",

Parse JSON in Google Refine

扶醉桌前 提交于 2019-11-27 18:20:31
问题 I'm trying to pull out specific elements from results from the Data Science Toolkit coordinates2politics API, using Google Refine. Here is sample cell #1: [{"politics":[ {"type":"admin2","friendly_type":"country","code":"usa","name":"United States"}, {"type":"admin6","friendly_type":"county","code":"55_025","name":"Dane"}, {"type":"constituency","friendly_type":"constituency","code":"55_02","name":"Second district, WI"}, {"type":"admin5","friendly_type":"city","code":"55_48000","name":