问题
I have a data table where the last column is a column of lists. Below is how it looks:
Col1 | Col2 | ListCol
--------------------------
na | na | [obj1, obj2]
na | na | [obj1, obj2]
na | na | [obj1, obj2]
What I want is
Col1 | Col2 | Col3 | Col4
--------------------------
na | na | obj1 | obj2
na | na | obj1 | obj2
na | na | obj1 | obj2
I know that all the lists have the same amount of elements.
Edit:
Every element in ListCol is a list with two elements.
回答1:
Here is one approach, using unnest
and tidyr::spread
...
library(dplyr)
library(tidyr)
#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))
df %>% unnest(b) %>%
group_by(a) %>%
mutate(col=seq_along(a)) %>% #add a column indicator
spread(key=col, value=b)
a `1` `2`
<dbl> <dbl> <dbl>
1 1. 2. 3.
2 2. 4. 5.
3 3. 6. 7.
回答2:
Currently, the tidyverse answer would be:
library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)
回答3:
Here's an option with data.table
and base::unlist
.
library(data.table)
DT <- data.table(a = list(1, 2, 3),
b = list(list(1, 2),
list(2, 1),
list(1, 1)))
for (i in 1:nrow(DT)) {
set(
DT,
i = i,
j = c('b1', 'b2'),
value = unlist(DT[i][['b']], recursive = FALSE)
)
}
DT
This requires a for loop on every row... Not ideal and very anti-data.table
.
I wonder if there's some way to avoid creating the list column in the first place...
回答4:
Comparison of two great answers
There are two great one liner suggestions
(1) cbind(df[1],t(data.frame(df$b)))
This is from @Onyambu
using base
R. To get to this answer one needs to know that a dataframe
is a list and needs a bit of creativity.
(2) df %>% unnest_wider(b)
This is from @iago
using tidyverse
. You need extra packages and to know all the nest
verbs, but one can think that it is more readable.
Now let's compare performance
library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)
N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))
tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
microbenchmark(tidy_foo(), base_foo())
Unit: milliseconds
expr min lq mean median uq max neval
tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122 100
base_foo() 4.5048 4.71365 5.41841 4.92275 5.2519 13.1042 100
Aouch!
base
R solution is 20 times faster.
来源:https://stackoverflow.com/questions/50881440/split-a-list-column-into-multiple-columns-in-r