R: Calculating Pearson correlation and R-squared by group

白昼怎懂夜的黑 提交于 2019-12-01 08:10:54

问题


I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of the year (1 = January), we would have to do the same for each month (12 times).

cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

Is there any way to do each month automatically?

In my case I have more than 30 groups (not months but species) to which I would like to test for correlations, I just wanted to know if there is a faster way than doing it one by one.

Thank you!


回答1:


cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])

gives you a 2 * 2 covariance matrix rather than a number. I bet you want a single number for each Month, so use

## cor(Temp, Humidity | Month)
with(airquality, mapply(cor, split(Temp, Month), split(Humidity, Month)) )

and you will obtain a vector.

Have a read around ?split and ?mapply; they are very useful for "by group" operations, although they are not the only option. Also read around ?cor, and compare the difference between

a <- rnorm(10)
b <- rnorm(10)
cor(a, b)
cor(cbind(a, b))

The answer you linked in your question is doing something similar to cor(cbind(a, b)).


Reproducible example

The airquality dataset in R does not have Humidity column, so I will use Wind for testing:

## cor(Temp, Wind | Month)
x <- with(airquality, mapply(cor, split(Temp, Month), split(Wind, Month)) )

#         5          6          7          8          9 
#-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701 

We get a named vector, where names(x) gives Month, and unname(x) gives correlation.


Thank you very much! It worked just perfectly! I was trying to figure out how to obtain a vector with the R^2 for each correlation too, but I can't... Any ideas?

cor(x, y) is like fitting a standardised linear regression model:

coef(lm(scale(y) ~ scale(x) - 1))  ## remember to drop intercept

The R-squared in this simple linear regression is just the square of the slope. Previously we have x storing correlation per group, now R-squared is just x ^ 2.



来源:https://stackoverflow.com/questions/40793506/r-calculating-pearson-correlation-and-r-squared-by-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!