Openrefine - Transpose rows into columns based on text

女生的网名这么多〃 提交于 2019-12-24 00:45:02

问题


I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns.

The data is in this one column in the following order: Title Document Type Author Date

But in some cases, the catalogue records appear in the order: Title Document Type Synopsis Author Date

Therefore I cannot transpose these records into columns based on the number of rows.

Each title has the word "Description" ahead of it. This is the one regular feature throughout the entire dataset.

Is there a way to use OpenRefine to transpose rows into columns based on the text in a column? To transpose x rows after the row containing "Description" until the next instance of the word "Description"?


回答1:


The approach I'd suggest is to group your rows into OpenRefine 'records' - I'd approach this as follows:

  • Import the data into OpenRefine as it is
  • Write a 'custom text facet' with the GREL value.startsWith("Description")
  • Select the rows for which this facet shows 'true' - this should give you all the rows containing titles
  • Still with this facet choice applied, use 'add column based on this column' to add a new column which contains just the titles
  • Move this new column to the start (left hand) of your project
  • Switch to 'Records' mode

You should now see that you have a single Record for each set of rows which relate to the same title. You can now use the option to "Join multi-valued cells" to get the title,document type,synopsis(if exists),author, and date into a single cell

Now use 'split into several columns' to split the values across columns

You should now have one row per title. You'll still have a little work to do as the data in rows where there is a synopsis will be shifted across by one compared to the rows where there is no synopsis. To fix this I'd suggest a 'facet by blank' on the last column - the non-synopsis rows should be empty in the last column as there is one less bit of data.

You can then use transformations to shift the values across columns one by one (starting at the empty column, otherwise you'll overwrite data).

Hope that all makes sense. If you post some example data as Ettore suggests then I could do a screen cast to illustrate

Owen



来源:https://stackoverflow.com/questions/46489840/openrefine-transpose-rows-into-columns-based-on-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!