问题
I have a number of datasets with identical variable names and want to combine them by rbind (row-wise) with a new column to identify the dataset by its name. e.g.
iris0 <- iris
iris1 <- cbind(log(iris[,1:4]),iris[5])
iris2 <- cbind(sqrt(iris[,1:4]),iris[5])
The desired output is the same irisR as the output of
irisR <- rbind( (iris0 %>% mutate(DS="iris0")), (iris1 %>% mutate(DS="iris1")), (iris2 %>% mutate(DS="iris2"))
But I need to do this automatically, because I have a lot more than three datasets. So my input would just be some vector c("iris0","iris1","iris2") or like that and I'd get out a dataframe like irisR.
(Just base R and dplyr please)
回答1:
You can use this little base R function. You pass it the data frames directly, but it captures their names into a list, and then an lapply is used to evaluate the name (to get the actual data frame), as well as converting the name to a character for insertion into the data frame itself. The frames are then all stuck together with do.call(rbind()):
bind_dfs <- function(...)
{
do.call(rbind, lapply(as.list(match.call()[-1]), function(x) {
y <- eval.parent(x)
y$DS <- rep(as.character(x), nrow(y))
y}))
}
Which you use like this:
bind_dfs(iris0, iris1, iris2)
And to show that it works:
result <- bind_dfs(iris0, iris1, iris2)
head(result)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species DS
#> 1 5.1 3.5 1.4 0.2 setosa iris0
#> 2 4.9 3.0 1.4 0.2 setosa iris0
#> 3 4.7 3.2 1.3 0.2 setosa iris0
#> 4 4.6 3.1 1.5 0.2 setosa iris0
#> 5 5.0 3.6 1.4 0.2 setosa iris0
#> 6 5.4 3.9 1.7 0.4 setosa iris0
tail(result)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species DS
#> 445 2.588436 1.816590 2.387467 1.581139 virginica iris2
#> 446 2.588436 1.732051 2.280351 1.516575 virginica iris2
#> 447 2.509980 1.581139 2.236068 1.378405 virginica iris2
#> 448 2.549510 1.732051 2.280351 1.414214 virginica iris2
#> 449 2.489980 1.843909 2.323790 1.516575 virginica iris2
#> 450 2.428992 1.732051 2.258318 1.341641 virginica iris2
回答2:
i made this in base R, focused on the problem of many dataframes
#create vector "iris1", "iris2", ...
data <- paste0("iris",1:10)
#use string vector as names for new objects
for (dat in data) {
assign(dat, cbind(log(mtcars[,3:5]),mtcars[,5]))
}
#create an empty object
irisR <- NULL
#rbind object together
for(dat in data){
irisR <- rbind(irisR, get(dat))
}
str(irisR)
'data.frame': 320 obs. of 4 variables:
$ disp : num 5.08 5.08 4.68 5.55 5.89 ...
$ hp : num 4.7 4.7 4.53 4.7 5.16 ...
$ drat : num 1.36 1.36 1.35 1.12 1.15 ...
$ mtcars[, 5]: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
回答3:
You can use mget to get data in a named list from character vector and use bind_rows to add the name of the list as a new column.
get_binded_data <- function(data_name) {
dplyr::bind_rows(mget(data_name, envir = .GlobalEnv), .id = "DS")
}
get_binded_data(c("iris0","iris1","iris2"))
# DS Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 iris0 5.10 3.50 1.4000 0.200 setosa
#2 iris0 4.90 3.00 1.4000 0.200 setosa
#3 iris0 4.70 3.20 1.3000 0.200 setosa
#4 iris0 4.60 3.10 1.5000 0.200 setosa
#5 iris0 5.00 3.60 1.4000 0.200 setosa
#.....
#151 iris1 1.63 1.25 0.3365 -1.609 setosa
#152 iris1 1.59 1.10 0.3365 -1.609 setosa
#153 iris1 1.55 1.16 0.2624 -1.609 setosa
#154 iris1 1.53 1.13 0.4055 -1.609 setosa
#155 iris1 1.61 1.28 0.3365 -1.609 setosa
#....
来源:https://stackoverflow.com/questions/62378965/combine-several-datasets-in-r