Transforming a nested data frame with varying number of elements

旧街凉风 提交于 2020-02-02 04:38:06

问题


I have a data frame with a column of nested data frames with 1 or 2 columns and n rows. It looks like df in the sample below:

'data.frame':   3 obs. of  2 variables:
 $ vector:List of 3
  ..$ : chr "p1"
  ..$ : chr "p2"
  ..$ : chr "p3"
 $ lists :List of 3
  ..$ :'data.frame':    2 obs. of  2 variables:
  .. ..$ n1: Factor w/ 2 levels "a","b": 1 2
  .. ..$ n2: Factor w/ 2 levels "1","2": 1 2
  ..$ :'data.frame':    1 obs. of  1 variable:
  .. ..$ n1: Factor w/ 1 level "d": 1
  ..$ :'data.frame':    1 obs. of  2 variables:
  .. ..$ n1: Factor w/ 1 level "e": 1
  .. ..$ n2: Factor w/ 1 level "3": 1

df can be recreated like this :

v <- c("p1", "p2", "p3")
l <- list(data.frame(n1 = c("a", "b"), n2 = c("1", "2")), data.frame(n1 = "d"), data.frame(n1 = "e", n2 = "3"))
df <- as.data.frame(cbind(v, l))

I'd like to transform it to a data frame that looks like that:

[v] [n1] [n2]

p1  a  1

p1  b  2

p2  d  NA

p3  e  3
  • n1 and n2 are in seperate columns
  • if the data frame in row i has n rows, the vector element of row i should be repeated n times
  • if there is no content in n1 or n2, there should be a NA

I've tried using tidyr::unnest but got the following error

 unnest(df)
Error: All nested columns must have the same number of elements.

Does anyone has a better idea how to transform the dataframe in the desired format?


回答1:


This will avoid by-row operations, which will be important if you have a lot of rows.

library(data.table)

rbindlist(df$l, fill = T, id = 'row')[, v := df$v[row]][]
#   row n1 n2  v
#1:   1  a  1 p1
#2:   1  b  2 p1
#3:   2  d NA p2
#4:   3  e  3 p3



回答2:


Using purrr::pmap_df, within each row of df, we combine v and l into a single data frame and then combine all of the data frames into a single data frame.

library(tidyverse)

pmap_df(df, function(v,l) {
  data.frame(v,l)
})
   v n1   n2
1 p1  a    1
2 p1  b    2
3 p2  d <NA>
4 p3  e    3



回答3:


A solution using dplyr and tidyr. suppressWarnings is not required. Because when you created data frames, there are factor columns, suppressWarnings is to suppress the warning message when combining factors.

library(dplyr)
library(tidyr)

df1 <- suppressWarnings(df %>%
  mutate(v = unlist(.$v)) %>%
  unnest())
df1
#    v n1   n2
# 1 p1  a    1
# 2 p1  b    2
# 3 p2  d <NA>
# 4 p3  e    3


来源:https://stackoverflow.com/questions/47722233/transforming-a-nested-data-frame-with-varying-number-of-elements

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!