Transforming a nested data frame with varying number of elements

问题

I have a data frame with a column of nested data frames with 1 or 2 columns and n rows. It looks like df in the sample below:

'data.frame':   3 obs. of  2 variables:
 $ vector:List of 3
  ..$ : chr "p1"
  ..$ : chr "p2"
  ..$ : chr "p3"
 $ lists :List of 3
  ..$ :'data.frame':    2 obs. of  2 variables:
  .. ..$ n1: Factor w/ 2 levels "a","b": 1 2
  .. ..$ n2: Factor w/ 2 levels "1","2": 1 2
  ..$ :'data.frame':    1 obs. of  1 variable:
  .. ..$ n1: Factor w/ 1 level "d": 1
  ..$ :'data.frame':    1 obs. of  2 variables:
  .. ..$ n1: Factor w/ 1 level "e": 1
  .. ..$ n2: Factor w/ 1 level "3": 1

df can be recreated like this :

v <- c("p1", "p2", "p3")
l <- list(data.frame(n1 = c("a", "b"), n2 = c("1", "2")), data.frame(n1 = "d"), data.frame(n1 = "e", n2 = "3"))
df <- as.data.frame(cbind(v, l))

I'd like to transform it to a data frame that looks like that:

[v] [n1] [n2]

p1  a  1

p1  b  2

p2  d  NA

p3  e  3

n1 and n2 are in seperate columns
if the data frame in row i has n rows, the vector element of row i should be repeated n times
if there is no content in n1 or n2, there should be a NA

I've tried using tidyr::unnest but got the following error

 unnest(df)
Error: All nested columns must have the same number of elements.

Does anyone has a better idea how to transform the dataframe in the desired format?

回答1:

This will avoid by-row operations, which will be important if you have a lot of rows.

library(data.table)

rbindlist(df$l, fill = T, id = 'row')[, v := df$v[row]][]
#   row n1 n2  v
#1:   1  a  1 p1
#2:   1  b  2 p1
#3:   2  d NA p2
#4:   3  e  3 p3

回答2:

Using purrr::pmap_df, within each row of df, we combine v and l into a single data frame and then combine all of the data frames into a single data frame.

library(tidyverse)

pmap_df(df, function(v,l) {
  data.frame(v,l)
})

   v n1   n2
1 p1  a    1
2 p1  b    2
3 p2  d <NA>
4 p3  e    3

回答3:

A solution using dplyr and tidyr. suppressWarnings is not required. Because when you created data frames, there are factor columns, suppressWarnings is to suppress the warning message when combining factors.

library(dplyr)
library(tidyr)

df1 <- suppressWarnings(df %>%
  mutate(v = unlist(.$v)) %>%
  unnest())
df1
#    v n1   n2
# 1 p1  a    1
# 2 p1  b    2
# 3 p2  d <NA>
# 4 p3  e    3

来源：https://stackoverflow.com/questions/47722233/transforming-a-nested-data-frame-with-varying-number-of-elements

标签

dataframe

tidyr