openrefine

OpenRefine changing the port and host when executable is run directly

荒凉一梦 提交于 2020-01-15 05:37:07
问题 The refine.ini allow setting the port and host without the need to re-building, but it says the following: # NOTE: This file is not read if you run the Refine executable directly # It is only read of you use the refine shell script or refine.bat from my limited observation i noticed that when the executable is run directly the value for port and host are always the ones set as default in Refine.java . is there a way to change the port and host when running the executable directly without the

Couple the data in all possible combinations

大城市里の小女人 提交于 2019-12-25 09:15:50
问题 I have data in column in two columns like this Id Value 1 a 2 f 1 c 1 h 2 a and I'd like couple the data of the 'Value' column in all possible combinations based on the same Id such as (a,c) (a,h) (c,h) (f,a) Is there any R or Python or VBA code to get this task? 回答1: To return a character matrix with these combinations using base R, try do.call(rbind, t(sapply(split(df, df$Id), function(i) t(combn(i$Value, 2))))) [,1] [,2] [1,] "a" "c" [2,] "a" "h" [3,] "c" "h" [4,] "f" "a" Each row is a

Removing duplicate strings from a comma separated list, in a cell

橙三吉。 提交于 2019-12-24 06:45:11
问题 I'm using Google Sheets and this is way beyond my simple scripting. I have numerous cells containing comma separated values; AA, BB, CC, BBB, CCC, CCCCC, AA, BBB, BB BB, ZZ, ZZ, AA, BB, CC, BBB, CCC, CCCCC, AA, BBB, BB I'm trying to return: AA, BB, CC, BBB, CCC, CCCCC etc. BB, ZZ, AA, CC, BBB, CCC, CCCCC etc. ... remove the duplicates. Per cell. I can't get my head around a solution. I've tried every online tool that removes duplicates. BUT they all remove duplicates throughout my document.

Openrefine - Transpose rows into columns based on text

女生的网名这么多〃 提交于 2019-12-24 00:45:02
问题 I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns. The data is in this one column in the following order: Title Document Type Author Date But in some cases, the catalogue records appear in the order: Title Document Type Synopsis Author Date Therefore I cannot transpose these records into columns based on the number of rows. Each title has the

How to integrate tabular data into GraphDB automatically?

↘锁芯ラ 提交于 2019-12-23 03:36:26
问题 I want to import tabular (xls) data automatically into GraphDB. OntoRefine suits my case very will with the power of OpenRefine and SPARQL. Now, i am thinking about the following approach New tabular data is available as XLS file OntoRefine updates a project or creates a new project automatically SPARQL queries against RDFbridge to create new triples SPARQL Insert to add this triple Is there an alternative approach to automate it? If this is the best, how can i update or create a new

How to perform approximate (fuzzy) name matching in R

余生颓废 提交于 2019-12-21 05:00:22
问题 I have a large data set, dedicated to biological journals, which was being composed for a long time by different people. So, the data are not in a single format. For example, in the column "AUTHOR" I can find John Smith, Smith John, Smith J and so on while it is the same person. I can not perform even the simplest actions. For example, I can't figure out what authors wrote the most articles. Is there any way in R to determine if the majority of symbols in the different names is the same, take

getting error while importing rdf [closed]

泪湿孤枕 提交于 2019-12-13 09:34:43
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 5 years ago . i was trying to import freebase rdf to google refine but getting an error....but now how to extract topic names with notable type from 18 gb rdf to csv etc....any gui tool ? 回答1: 146 GB is too big for OpenRefine (ex-Google Refine) to handle. If there is a GUI tool that will do this out of the box,

How can I access the API of OntoRefine?

半世苍凉 提交于 2019-12-13 03:28:50
问题 In our current project we have a lot of data in table form that we want to transform to RDF. OpenRefine offers the possibility to create projects or update data via an API (see: https://github.com/OpenRefine/OpenRefine/wiki/OpenRefine-API). Is it possible to use this API with OntoRefine and if so, how do I do it? Or are we better advised to use OpenRefine? This question was similarly asked a little over a year ago but had not received an answer. (How to integrate tabular data into GraphDB

Best way to parse a big and intricated Json file with OpenRefine (or R)

余生长醉 提交于 2019-12-12 08:55:53
问题 I know how to parse json cells in Open refine, but this one is too tricky for me. I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs. Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions For each ID and each day of the year from now until november 2017, i would like to extract the availability of