R, rbind with multiple files defined by a variable

我的未来我决定 提交于 2019-12-02 08:35:37

Well, this uses an lapply, but it might be what you want.

file_list <- list.files("*your directory*", full.names = T)

combined_data <- do.call(rbind, lapply(file_list, read.csv, header = TRUE))

This will turn all of your files into one large dataset, and from there it's easy to take the mean. Is that what you wanted?

An alternative way of doing this would be to step through file by file, taking sums and number of observations and then taking the mean afterwards, like so:

sums <- numeric()
n <- numeric()
i <- 1
for(file in file_list){
  temp_df <- read.csv(file, header = T)
  temp_mean <- mean(temp_df$pollutant)
  sums[i] <- sum(temp_df$pollutant)
  n[i] <- nrow(temp_df)
  i <- i + 1
}
new_mean <- sum(sums)/sum(n)

Note that both of these methods require that only your desired csvs are in that folder. You can use a pattern argument in the list.files call if you have other files in there that you're not interested in.

A vector is not accepted for 'file' in read.csv(file, ...)

Below is a slight modification of yours. A vector of file paths are created and they are looped by sapply.

files <- paste("directory-name/",formatC(1:332, width=3, flag="0"),
               ".csv",sep="")
pollutantmean <- function(file, pollutant) {
    dataset <- read.csv(file, header = TRUE)
    mean(dataset[, pollutant], na.rm = TRUE)
}
sapply(files, pollutantmean)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!