问题

I have a data table where the last column is a column of lists. Below is how it looks:

Col1 | Col2 | ListCol
--------------------------
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]

What I want is

Col1 | Col2 | Col3  | Col4
--------------------------
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2

I know that all the lists have the same amount of elements.

Edit:

Every element in ListCol is a list with two elements.

回答1:

Here is one approach, using unnest and tidyr::spread...

library(dplyr)
library(tidyr)

#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))

df %>% unnest(b) %>% 
       group_by(a) %>% 
       mutate(col=seq_along(a)) %>% #add a column indicator
       spread(key=col, value=b)

      a   `1`   `2`
  <dbl> <dbl> <dbl>
1    1.    2.    3.
2    2.    4.    5.
3    3.    6.    7.

回答2:

Currently, the tidyverse answer would be:

library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)

回答3:

Here's an option with data.table and base::unlist.

library(data.table)

DT <- data.table(a = list(1, 2, 3),
                                 b = list(list(1, 2),
                                              list(2, 1),
                                              list(1, 1)))

for (i in 1:nrow(DT)) {
  set(
    DT,
    i = i,
    j = c('b1', 'b2'),
    value = unlist(DT[i][['b']], recursive = FALSE)
  )
}
DT

This requires a for loop on every row... Not ideal and very anti-data.table. I wonder if there's some way to avoid creating the list column in the first place...

回答4:

Comparison of two great answers

There are two great one liner suggestions

(1) `cbind(df[1],t(data.frame(df$b)))`

This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.

(2) `df %>% unnest_wider(b)`

This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.

Now let's compare performance

library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)

N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))

tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
  
microbenchmark(tidy_foo(), base_foo())

Unit: milliseconds
       expr      min        lq      mean    median       uq      max neval
 tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122   100
 base_foo()   4.5048   4.71365   5.41841   4.92275   5.2519  13.1042   100

Aouch!

base R solution is 20 times faster.

来源：https://stackoverflow.com/questions/50881440/split-a-list-column-into-multiple-columns-in-r

标签

multiple-columns

Split a list column into multiple columns in R

问题