How to merge rows in OpenRefine

拥有回忆 提交于 2019-12-09 03:56:30

问题


How to merge rows based on some ID field?

Original Table                   New Table

ID   | Field1 | Field2       ID     | Field1 | Field2
-----|------- |--------      -------|--------|-------
A        5                    A         5        10
A                10           B         1        3
B        1                    C         4        150
B                3
C        4
C                150

I want to fill a given cell value based on value in a group identified by some ID field.

That is, I want to aggregate table and use non empty value in each column as aggregation function.


回答1:


I think a simpler solution would be to use:

1° The feature "Edit Cells / Blank Down" on your ID column, in order to get something like this:

2° Then "Edit Cells / Join Multivalued cells" on the last column only (Field2), which will produce this:




回答2:


In the ID column use the menu option: Edit Cells -> Blank down This should leave you with a table looking like:

ID   | Field1 | Field2 
-----|------- |--------
A        5             
                 10    
B        1             
                 3
C        4
                 150

Make sure you are in "Records" mode (this option is at the top left of the data grid). You should see the rows for each ID are grouped together.

Now use Edit Cells -> Join multi-valued cells on each of the other columns - this should leave you with a single row per record once you have done this for all columns




回答3:


For "ID" column use "add column based on this column":

filter(
  cell.cross("ProjectName", "ID").cells["Field1"].value,
  v,
  isNonBlank(v)
)[0]

This will set a value for each row identified ID.

Original Table      New Table

ID   | Field1 | Field2 | Field1_ | Field2_
-----|------- |--------|---------|--------
A        5                  5        10
A                10         5        10
B        1                  1        3
B                3          1        3
C        4                  4        150
C                150        4        150

Remove old columns.

After that, remove duplicates by using "blank down + facet by blank + remove matching rows" approach




回答4:


It's not OpenRefine but I think it's a really good tool for a OpenRefine user. You could run this Miller (https://github.com/johnkerl/miller) command

mlr --csv reshape -r "Field" -o item,value \
then filter -x -S '$value==""' \
then reshape -s item,value input.csv

to have

ID,Field1,Field2
A,5,10
B,1,3
C,4,150

First I create a tidy version of the data (https://vita.had.co.nz/papers/tidy-data.pdf), and than I transform again it from long to wide format



来源:https://stackoverflow.com/questions/58677751/how-to-merge-rows-in-openrefine

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!