R: Calculating Pearson correlation and R-squared by group

后端 未结 1 502
青春惊慌失措
青春惊慌失措 2021-01-15 14:56

I am trying to extend the answer of a question R: filtering data and calculating correlation.

To obtain the correlation of temperature and humidity for each month of

相关标签:
1条回答
  • 2021-01-15 15:22
    cor(airquality[airquality$Month == 1, c("Temp", "Humidity")])
    

    gives you a 2 * 2 covariance matrix rather than a number. I bet you want a single number for each Month, so use

    ## cor(Temp, Humidity | Month)
    with(airquality, mapply(cor, split(Temp, Month), split(Humidity, Month)) )
    

    and you will obtain a vector.

    Have a read around ?split and ?mapply; they are very useful for "by group" operations, although they are not the only option. Also read around ?cor, and compare the difference between

    a <- rnorm(10)
    b <- rnorm(10)
    cor(a, b)
    cor(cbind(a, b))
    

    The answer you linked in your question is doing something similar to cor(cbind(a, b)).


    Reproducible example

    The airquality dataset in R does not have Humidity column, so I will use Wind for testing:

    ## cor(Temp, Wind | Month)
    x <- with(airquality, mapply(cor, split(Temp, Month), split(Wind, Month)) )
    
    #         5          6          7          8          9 
    #-0.3732760 -0.1210353 -0.3052355 -0.5076146 -0.5704701 
    

    We get a named vector, where names(x) gives Month, and unname(x) gives correlation.


    Thank you very much! It worked just perfectly! I was trying to figure out how to obtain a vector with the R^2 for each correlation too, but I can't... Any ideas?

    cor(x, y) is like fitting a standardised linear regression model:

    coef(lm(scale(y) ~ scale(x) - 1))  ## remember to drop intercept
    

    The R-squared in this simple linear regression is just the square of the slope. Previously we have x storing correlation per group, now R-squared is just x ^ 2.

    0 讨论(0)
提交回复
热议问题