问题
i'v been trying to write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. There are multiple errors being generated hence i'm not mentioning them here.
The data files for the code are here: https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip
Code
corr<-function(directory, threshold=0){
files.list=list.files(directory, full.names=TRUE, pattern=".csv")
comp.sum<-numeric()
num<-numeric()
for(i in 1:332){
data<-read.csv(files.list[i])
data.cor<-na.omit(data[,2:3])
comp.sum<-sum(data.cor)
if
{
comp.sum>threshold
cor.var<-cor(data.cor, use="all.obs")
}
else
{
num
}
}
cor.var
}
回答1:
I modified the function a bit to get what you would like. This of course assumes that sulfate and nitrate are always in column 2 and 3 and that there are no other csvs in that directory (as if there are numbers in those columns a correlation coefficient would be calculated for something else).
corr<-function(directory, threshold=0){
files.list=list.files(directory, full.names=TRUE, pattern=".csv")
cors <- rep(0, length(files.list))
for(i in 1:length(files.list)){
data<-read.csv(files.list[i], header = TRUE)
data.cor<-na.omit(data[,2:3])
nobs<-nrow(data.cor)
if(nobs > threshold){
cors[i]<-cor(data.cor[,1], data.cor[,2])
}else{
cors[i] <- 0
}
}
return(cors)
}
来源:https://stackoverflow.com/questions/40177220/cor-function-in-r-producing-errors