Calculate Correlations of Pairs of Columns in a Data Frame in R

左心房为你撑大大i 提交于 2019-12-25 01:53:56

问题


I have the following dataframe:

set.seed(1)
y <- data.frame(a1 = rnorm(5) , b1 = rnorm(5), c1 = rnorm(5),  a2 = rnorm(5), b2 = rnorm(5), c2 = rnorm(5))

I would like to obtain the correlations of the pairs of columns: cor(a1,a2), cor(b1,b2), cor(c1,c2)

I tried the following but NA's appear as output:

apply(y,2,function(x) cor(x[1],x[3]))

I would like to get the result equivalent to

cor(y[,1],y[,4])
cor(y[,2],y[,5])
cor(y[,3],y[,6])

In my actual data frame, I have many more pairs of columns.

Any ideas?

Thanks for your support.


回答1:


num.vars <- length(y)
var1 <- head(names(y), num.vars / 2)
var2 <- tail(names(y), num.vars / 2)
mapply(cor, y[var1], y[var2])
#         a1         b1         c1 
#  0.2491625 -0.5313192  0.5594564 



回答2:


Another approach using variable regular expression on names. This works also if variable names are in arbitrary order.

nn <- 
unique(sub('([0-9]+)','',names(y )))

sapply(nn,function(x){
    xy = y[,grep(x,names(y))]
    cor(xy[,1],xy[,2])})
         a          b          c 
-0.7615458  0.5683647  0.5594564 


来源:https://stackoverflow.com/questions/21898402/calculate-correlations-of-pairs-of-columns-in-a-data-frame-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!