问题
I have a data table where the last column is a column of lists. Below is how it looks:
Col1 | Col2 | ListCol
--------------------------
na | na | [obj1, obj2]
na | na | [obj1, obj2]
na | na | [obj1, obj2]
What I want is
Col1 | Col2 | Col3 | Col4
--------------------------
na | na | obj1 | obj2
na | na | obj1 | obj2
na | na | obj1 | obj2
I know that all the lists have the same amount of elements.
Edit:
Every element in ListCol is a list with two elements.
回答1:
Here is one approach, using unnest and tidyr::spread...
library(dplyr)
library(tidyr)
#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))
df %>% unnest(b) %>%
group_by(a) %>%
mutate(col=seq_along(a)) %>% #add a column indicator
spread(key=col, value=b)
a `1` `2`
<dbl> <dbl> <dbl>
1 1. 2. 3.
2 2. 4. 5.
3 3. 6. 7.
回答2:
Currently, the tidyverse answer would be:
library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)
回答3:
Here's an option with data.table and base::unlist.
library(data.table)
DT <- data.table(a = list(1, 2, 3),
b = list(list(1, 2),
list(2, 1),
list(1, 1)))
for (i in 1:nrow(DT)) {
set(
DT,
i = i,
j = c('b1', 'b2'),
value = unlist(DT[i][['b']], recursive = FALSE)
)
}
DT
This requires a for loop on every row... Not ideal and very anti-data.table.
I wonder if there's some way to avoid creating the list column in the first place...
回答4:
Comparison of two great answers
There are two great one liner suggestions
(1) cbind(df[1],t(data.frame(df$b)))
This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.
(2) df %>% unnest_wider(b)
This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.
Now let's compare performance
library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)
N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))
tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
microbenchmark(tidy_foo(), base_foo())
Unit: milliseconds
expr min lq mean median uq max neval
tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122 100
base_foo() 4.5048 4.71365 5.41841 4.92275 5.2519 13.1042 100
Aouch!
base R solution is 20 times faster.
来源:https://stackoverflow.com/questions/50881440/split-a-list-column-into-multiple-columns-in-r