问题
I have a data.frame and I want to calculate correlation coefficients using one column against the other columns (there are some non-numeric columns in the frame as well).
ddply(Banks,.(brand_id,standard.quarter),function(x) { cor(BLY11,x) })
# Error in cor(BLY11, x) : 'y' must be numeric
I tested against is.numeric(x)
ddply(Banks,.(brand_id,standard.quarter),function(x) { if is.numeric(x) cor(BLY11,x) else 0 })
but that failed every comparison and returned 0 and returned only one column, as if its only being called once. What is being passed to the function? Just coming to R and I think there's something fundamental I'm missing.
Thanks
回答1:
Try something like this one
cor(longley[, 1], longley[ , sapply(longley, is.numeric)])
GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
[1,] 1 0.9915892 0.6206334 0.4647442 0.9791634 0.9911492 0.9708985
回答2:
From ?cor:
If ‘x’ and ‘y’ are matrices then the covariances (or correlations) between the columns of ‘x’ and the columns of ‘y’ are computed.
So your only real job is to remove the non-numeric columns:
# An example data.frame containing a non-numeric column
d <- cbind(fac=c("A","B"), mtcars)
## Calculate correlations between the mpg column and all numeric columns
cor(d$mpg, as.matrix(d[sapply(d, is.numeric)]))
mpg cyl disp hp drat wt qsec
[1,] 1 -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
vs am gear carb
[1,] 0.6640389 0.5998324 0.4802848 -0.5509251
Edit: And in fact, as @MYaseen208's answer shows, there's no need to explicitly convert data.frames to matrices. Both of the following work just fine:
cor(d$mpg, d[sapply(d, is.numeric)])
cor(mtcars, mtcars)
回答3:
This function operates on a chunk:
calc_cor_only_numeric = function(chunk) {
is_numeric = sapply(chunk, is.numeric)
return(cor(chunk[-is_numeric]))
}
And can be used by ddply
:
ddply(banks, .(cat1, cat2), calc_cor_only_numeric)
I could not check the code, but this should get you started.
回答4:
ddply splits a data.frame into chunks and sends them (smaller data.frames) to your function. your x
is a data.frame with the same columns as Banks
. Thus, is.numeric(x)
is FALSE
. is.data.frame(x)
should return TRUE
.
try:
function(x) {
cor(x$BLY11, x$othercolumnname)
}
回答5:
It looks like what you're doing can be done with sapply
as well:
with(Banks,
sapply( list(brand_id,standard.quarter), function(x) cor(BLY11,x) )
)
来源:https://stackoverflow.com/questions/12182105/how-can-correlate-against-multiple-columns-using-ddply