问题
I often find questions where people have somehow ended up with an unnamed list of unnamed character vectors and they want to bind them row-wise into a data.frame
. Here is an example:
library(magrittr)
data <- cbind(LETTERS[1:3],1:3,4:6,7:9,c(12,15,18)) %>%
split(1:3) %>% unname
data
#[[1]]
#[1] "A" "1" "4" "7" "12"
#
#[[2]]
#[1] "B" "2" "5" "8" "15"
#
#[[3]]
#[1] "C" "3" "6" "9" "18"
One typical approach is with do.call
from base R.
do.call(rbind, data) %>% as.data.frame
# V1 V2 V3 V4 V5
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
Perhaps a less efficient approach is with Reduce
from base R.
Reduce(rbind,data, init = NULL) %>% as.data.frame
# V1 V2 V3 V4 V5
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
However, when we consider more modern packages such as dplyr
or data.table
, some of the approaches that might immediately come to mind don't work because the vectors are unnamed or aren't a list.
library(dplyr)
bind_rows(data)
#Error: Argument 1 must have names
library(data.table)
rbindlist(data)
#Error in rbindlist(data) :
# Item 1 of input is not a data.frame, data.table or list
One approach might be to set_names
on the vectors.
library(purrr)
map_df(data, ~set_names(.x, seq_along(.x)))
# A tibble: 3 x 5
# `1` `2` `3` `4` `5`
# <chr> <chr> <chr> <chr> <chr>
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
However, this seems like more steps than it needs to be.
Therefore, my question is what is an efficient tidyverse
or data.table
approach to binding an unnamed list of unnamed character vectors into a data.frame
row-wise?
回答1:
Not entirely sure about efficiency, but a compact option using purrr
and tibble
could be:
map_dfc(purrr::transpose(data), ~ unlist(tibble(.)))
V1 V2 V3 V4 V5
<chr> <chr> <chr> <chr> <chr>
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
回答2:
Edit
Use @sindri_baldur's approach: https://stackoverflow.com/a/61660119/8583393
A way with data.table
, similar to what @tmfmnk showed
library(data.table)
as.data.table(transpose(data))
# V1 V2 V3 V4 V5
#1: A 1 4 7 12
#2: B 2 5 8 15
#3: C 3 6 9 18
回答3:
library(data.table)
setDF(transpose(data))
V1 V2 V3 V4 V5
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
回答4:
This seems rather compact. I believe this is what powers bind_rows()
from dplyr
and therefore map_df()
in purrr
, so should be fairly efficient.
library(vctrs)
vec_rbind(!!!data)
This gives a data.frame.
...1 ...2 ...3 ...4 ...5
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
Some Benchmarks
It seems like the .name_repair
within the tidyverse
methods is a severe bottleneck. I took a few fairly straightforward options that also seemed to run the quickest from the other posts (thanks H 1 and sindri_baldur).
microbenchmark(vctrs = vec_rbind(!!!data),
dt = rbindlist(lapply(data, as.list)),
map = map_df(data, as_tibble_row, .name_repair = "unique"),
base = as.data.frame(do.call(rbind, data)))
But if you first name the vectors (but not necessarily the list elements), you get a different story.
data2 <- modify(data, ~set_names(.x, seq(.x)))
microbenchmark(vctrs = vec_rbind(!!!data2),
dt = rbindlist(lapply(data2, as.list)),
map = map_df(data2, as_tibble_row),
base = as.data.frame(do.call(rbind, data2)))
In fact, you can include the time to name the vectors into the vec_rbind()
solution and not the others, and still see fairly high performance.
microbenchmark(vctrs = vec_rbind(!!!modify(data, ~set_names(.x, seq(.x)))),
dt = setDF(transpose(data)),
map = map_df(data2, as_tibble_row),
base = as.data.frame(do.call(rbind, data)))
For what its worth.
回答5:
An option with unnest_wider
library(tibble)
library(tidyr)
library(stringr)
tibble(col = data) %>%
unnest_wider(c(col), names_repair = ~ str_c('value', seq_along(.)))
# A tibble: 3 x 5
# value1 value2 value3 value4 value5
# <chr> <chr> <chr> <chr> <chr>
#1 A 1 4 7 12
#2 B 2 5 8 15
#3 C 3 6 9 18
回答6:
My approach would be to just turn those list entries into expected type
rbindlist(lapply(data, as.list))
# V1 V2 V3 V4 V5
# <char> <char> <char> <char> <char>
#1: A 1 4 7 12
#2: B 2 5 8 15
#3: C 3 6 9 18
If you want your data types to be adjusted from character vector to appropriate types, then lapply
can help here as well. First lapply
is called for every row, second lapply
is called for every column.
rbindlist(lapply(data, as.list))[, lapply(.SD, type.convert)]
V1 V2 V3 V4 V5
<fctr> <int> <int> <int> <int>
1: A 1 4 7 12
2: B 2 5 8 15
3: C 3 6 9 18
回答7:
Here is a slight variation on tmfmnk's suggested approach using as_tibble_row()
to convert the vectors into single row tibbles. It's also necessary to use the .name_repair
argument:
library(purrr)
library(tibble)
map_df(data, as_tibble_row, .name_repair = ~paste0("value", seq(.x)))
# A tibble: 3 x 5
value1 value2 value3 value4 value5
<chr> <chr> <chr> <chr> <chr>
1 A 1 4 7 12
2 B 2 5 8 15
3 C 3 6 9 18
来源:https://stackoverflow.com/questions/61614900/tidyverse-approach-to-binding-unnamed-list-of-unnamed-vectors-by-row-do-callr