Split a list column into multiple columns in R

…衆ロ難τιáo~ 提交于 2021-01-21 12:23:26

问题


I have a data table where the last column is a column of lists. Below is how it looks:

Col1 | Col2 | ListCol
--------------------------
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]

What I want is

Col1 | Col2 | Col3  | Col4
--------------------------
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2

I know that all the lists have the same amount of elements.

Edit:

Every element in ListCol is a list with two elements.


回答1:


Here is one approach, using unnest and tidyr::spread...

library(dplyr)
library(tidyr)

#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))

df %>% unnest(b) %>% 
       group_by(a) %>% 
       mutate(col=seq_along(a)) %>% #add a column indicator
       spread(key=col, value=b)

      a   `1`   `2`
  <dbl> <dbl> <dbl>
1    1.    2.    3.
2    2.    4.    5.
3    3.    6.    7.



回答2:


Currently, the tidyverse answer would be:

library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)



回答3:


Here's an option with data.table and base::unlist.

library(data.table)

DT <- data.table(a = list(1, 2, 3),
                                 b = list(list(1, 2),
                                              list(2, 1),
                                              list(1, 1)))

for (i in 1:nrow(DT)) {
  set(
    DT,
    i = i,
    j = c('b1', 'b2'),
    value = unlist(DT[i][['b']], recursive = FALSE)
  )
}
DT

This requires a for loop on every row... Not ideal and very anti-data.table. I wonder if there's some way to avoid creating the list column in the first place...




回答4:


Comparison of two great answers

There are two great one liner suggestions

(1) cbind(df[1],t(data.frame(df$b)))

This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.

(2) df %>% unnest_wider(b)

This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.

Now let's compare performance

library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)

N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))

tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
  
microbenchmark(tidy_foo(), base_foo())

Unit: milliseconds
       expr      min        lq      mean    median       uq      max neval
 tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122   100
 base_foo()   4.5048   4.71365   5.41841   4.92275   5.2519  13.1042   100

Aouch!

base R solution is 20 times faster.



来源:https://stackoverflow.com/questions/50881440/split-a-list-column-into-multiple-columns-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!