问题
I have 82 .csv files, each of them a zoo object, with the following format:"Index", "code", "pp"
1951-01-01, 2030, 22.9
1951-01-02, 2030, 0.5
1951-01-03, 2030, 0.0
I want to do a correlation matrix between the pp of all of my files. I found out how to do it "manually" between two files:zz<-merge(x,y, all = FALSE)
z<-cbind(zz[,2], zz[,4])
cor(z,use= "complete.obs")
but I can't come up with a loop to do it for all the files... a few things to consider: each file starts and ends at different dates and I would like the matrix to show the codes so I can identify who is who.
Can anyone help?
回答1:
I think you have the bones of a perfectly good solution here, actually. If you start with list.files()
to generate a list of your csv files:
fileList <- list.files(path="path/to/csv/files")
then read in all the files using lapply()
:
datList <- lapply(fileList,read.csv)
then merge the first two files (assuming the code is the same for each file):
dat <- merge(datList[[1]][,-2],datList[[2]][,-2],by="Index",
suffixes=c(datList[[1]]$code[1],datList[[2]]$code[1]))
The suffixes
argument will help you name the columns by code, for future reference. Then loop over the rest of datList using a simple for loop, merging each one with dat:
for (i in 3:length(datList)){
dat <- merge(dat,datList[[i]][,-2],by="Index",suffixes=datList[[i]]$code[1])
}
and then you should be able to run cor
on dat
minus the first column. You might have to tweak this code a bit, but this general idea ought to work.
来源:https://stackoverflow.com/questions/6229676/correlation-matrix-between-different-files