applying same function on multiple files in R

谁都会走 提交于 2021-02-15 02:55:58

问题


I am new to R program and currently working on a set of financial data. Now I got around 10 csv files under my working directory and I want to analyze one of them and apply the same command to the rest of csv files.

Here are all the names of these files: ("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")

For example, because the Date column in CSV files are Factor so I need to change them to Date format:

CAN <- read.csv("CAN%10y.csv", header = T, sep = ",")
CAN$Date <- as.character(CAN$Date)
CAN$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
CAN_merge <- merge(all.dates.frame, CAN, all = T)
CAN_merge$Bid.Yield.To.Maturity <- NULL

all.dates.frame is a data frame of 731 consecutive days. I want to merge them so that each file will have the same number of rows which later enables me to combine 10 files together to get a 731 X 11 master data frame.

Surely I can copy and paste this code and change the file name, but is there any simple approach to use apply or for loop to do that ???

Thank you very much for your help.


回答1:


This should do the trick. Leave a comment if a certain part doesn't work. Wrote this blind without testing.

Get a list of files in your current directory ending in name .csv

L = list.files(".", ".csv")

Loop through each of the name and reads in each file, perform the actions you want to perform, return the data.frame DF_Merge and store them in a list.

O = lapply(L, function(x) {
           DF <- read.csv(x, header = T, sep = ",")
           DF$Date <- as.character(CAN$Date)
           DF$Date <- as.Date(CAN$Date, format ="%m/%d/%y")
           DF_Merge <- merge(all.dates.frame, CAN, all = T)
           DF_Merge$Bid.Yield.To.Maturity <- NULL
           return(DF_Merge)})

Bind all the DF_Merge data.frames into one big data.frame

do.call(rbind, O)

I'm guessing you need some kind of indicator, so this may be useful. Create a indicator column based on the first 3 characters of your file name rep(substring(L, 1, 3), each = 731)




回答2:


A dplyr solution (though untested since no reproducible example given):

library(dplyr)

file_list <- c("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv")

can_l <- lapply(
  file_list
  , read.csv
)

can_l <- lapply(
  can_l
  , function(df) {
    df %>% mutate(Date = as.Date(as.character(Date), format ="%m/%d/%y"))
  }
)

# Rows do need to match when column-binding
can_merge <- left_join(
  all.dates.frame
  , bind_cols(can_l)
)

can_merge <- can_merge %>% 
  select(-Bid.Yield.To.Maturity)



回答3:


One possible solution would be to read all the files into R in the form of a list, and then use lapply to to apply a function to all data files. For example:

# Create vector of file names in working direcotry
files <- list.files() 
files <- files[grep("csv", files)]  

#create empty list
lst <- vector("list", length(files))

#Read files in to list
for(i in 1:length(files)) {
    lst[[i]] <- read.csv(files[i])
}

#Apply a function to the list
l <- lapply(lst, function(x) {
    x$Date <- as.Date(as.character(x$Date), format = "%m/%d/%y")
return(x)
}) 

Hope it's helpful.



来源:https://stackoverflow.com/questions/30790114/applying-same-function-on-multiple-files-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!