Openrefine - Transpose rows into columns based on text

问题

I've received a data dump from a library catalogue, it came out in .txt format. I've been able to get the data into a spreadsheet, but it is all in one column. I would to transpose the rows into columns.

The data is in this one column in the following order: Title Document Type Author Date

But in some cases, the catalogue records appear in the order: Title Document Type Synopsis Author Date

Therefore I cannot transpose these records into columns based on the number of rows.

Each title has the word "Description" ahead of it. This is the one regular feature throughout the entire dataset.

Is there a way to use OpenRefine to transpose rows into columns based on the text in a column? To transpose x rows after the row containing "Description" until the next instance of the word "Description"?

回答1:

The approach I'd suggest is to group your rows into OpenRefine 'records' - I'd approach this as follows:

Import the data into OpenRefine as it is
Write a 'custom text facet' with the GREL value.startsWith("Description")
Select the rows for which this facet shows 'true' - this should give you all the rows containing titles
Still with this facet choice applied, use 'add column based on this column' to add a new column which contains just the titles
Move this new column to the start (left hand) of your project
Switch to 'Records' mode

You should now see that you have a single Record for each set of rows which relate to the same title. You can now use the option to "Join multi-valued cells" to get the title,document type,synopsis(if exists),author, and date into a single cell

Now use 'split into several columns' to split the values across columns

You should now have one row per title. You'll still have a little work to do as the data in rows where there is a synopsis will be shifted across by one compared to the rows where there is no synopsis. To fix this I'd suggest a 'facet by blank' on the last column - the non-synopsis rows should be empty in the last column as there is one less bit of data.

You can then use transformations to shift the values across columns one by one (starting at the empty column, otherwise you'll overwrite data).

Hope that all makes sense. If you post some example data as Ettore suggests then I could do a screen cast to illustrate

Owen

来源：https://stackoverflow.com/questions/46489840/openrefine-transpose-rows-into-columns-based-on-text

标签

openrefine