Combine several datasets in R

十年热恋 提交于 2021-02-11 15:15:55

问题


I have a number of datasets with identical variable names and want to combine them by rbind (row-wise) with a new column to identify the dataset by its name. e.g.

iris0 <- iris
iris1 <- cbind(log(iris[,1:4]),iris[5])
iris2 <- cbind(sqrt(iris[,1:4]),iris[5])

The desired output is the same irisR as the output of

irisR <- rbind( (iris0 %>% mutate(DS="iris0")), (iris1 %>% mutate(DS="iris1")), (iris2 %>% mutate(DS="iris2"))

But I need to do this automatically, because I have a lot more than three datasets. So my input would just be some vector c("iris0","iris1","iris2") or like that and I'd get out a dataframe like irisR.

(Just base R and dplyr please)


回答1:


You can use this little base R function. You pass it the data frames directly, but it captures their names into a list, and then an lapply is used to evaluate the name (to get the actual data frame), as well as converting the name to a character for insertion into the data frame itself. The frames are then all stuck together with do.call(rbind()):

bind_dfs <- function(...)
{
  do.call(rbind, lapply(as.list(match.call()[-1]), function(x) {
    y <- eval.parent(x)
    y$DS <- rep(as.character(x), nrow(y))
    y}))
}

Which you use like this:

bind_dfs(iris0, iris1, iris2)

And to show that it works:

result <- bind_dfs(iris0, iris1, iris2)

head(result)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species    DS
#> 1          5.1         3.5          1.4         0.2  setosa iris0
#> 2          4.9         3.0          1.4         0.2  setosa iris0
#> 3          4.7         3.2          1.3         0.2  setosa iris0
#> 4          4.6         3.1          1.5         0.2  setosa iris0
#> 5          5.0         3.6          1.4         0.2  setosa iris0
#> 6          5.4         3.9          1.7         0.4  setosa iris0

tail(result)
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species    DS
#> 445     2.588436    1.816590     2.387467    1.581139 virginica iris2
#> 446     2.588436    1.732051     2.280351    1.516575 virginica iris2
#> 447     2.509980    1.581139     2.236068    1.378405 virginica iris2
#> 448     2.549510    1.732051     2.280351    1.414214 virginica iris2
#> 449     2.489980    1.843909     2.323790    1.516575 virginica iris2
#> 450     2.428992    1.732051     2.258318    1.341641 virginica iris2




回答2:


i made this in base R, focused on the problem of many dataframes

#create vector "iris1", "iris2", ...
data <- paste0("iris",1:10)

#use string vector as names for new objects
for (dat in data) {
  assign(dat, cbind(log(mtcars[,3:5]),mtcars[,5]))
}
#create an empty object
irisR <- NULL
#rbind object together
for(dat in data){
irisR <- rbind(irisR, get(dat))
}

str(irisR)
'data.frame':   320 obs. of  4 variables:
 $ disp       : num  5.08 5.08 4.68 5.55 5.89 ...
 $ hp         : num  4.7 4.7 4.53 4.7 5.16 ...
 $ drat       : num  1.36 1.36 1.35 1.12 1.15 ...
 $ mtcars[, 5]: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...



回答3:


You can use mget to get data in a named list from character vector and use bind_rows to add the name of the list as a new column.

get_binded_data <- function(data_name) {
   dplyr::bind_rows(mget(data_name, envir = .GlobalEnv), .id = "DS")
}

get_binded_data(c("iris0","iris1","iris2"))

#       DS Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1   iris0         5.10        3.50       1.4000       0.200     setosa
#2   iris0         4.90        3.00       1.4000       0.200     setosa
#3   iris0         4.70        3.20       1.3000       0.200     setosa
#4   iris0         4.60        3.10       1.5000       0.200     setosa
#5   iris0         5.00        3.60       1.4000       0.200     setosa
#.....
#151 iris1         1.63        1.25       0.3365      -1.609     setosa
#152 iris1         1.59        1.10       0.3365      -1.609     setosa
#153 iris1         1.55        1.16       0.2624      -1.609     setosa
#154 iris1         1.53        1.13       0.4055      -1.609     setosa
#155 iris1         1.61        1.28       0.3365      -1.609     setosa
#....


来源:https://stackoverflow.com/questions/62378965/combine-several-datasets-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!