Calculate correlation by aggregating columns of data frame

断了今生、忘了曾经 提交于 2019-12-09 03:25:04

问题


I have the following data frame:

y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) )

How to get a data frame which gives me the correlation between columns a,b and c,d for each row?

something like: sapply(y, function(x) {cor(x[2:3],x[4:5])})

Thank you, S


回答1:


You could use apply

> apply(y[,-1],1,function(x) cor(x[1:2],x[3:4]))
[1] -1 -1  1 -1 1

Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows):

> ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d)))
  group V1
1     a -1
2     b -1
3     c  1
4     d -1
5     e  1



回答2:


You can use apply to apply a function to each row (or column) of a matrix, array or data.frame.

apply(
  y[,-1], # Remove the first column, to ensure that u remains numeric
  1,      # Apply the function on each row
  function(u) cor( u[1:2], u[3:4] )
)

(With just 2 observations, the correlation can only be +1 or -1.)




回答3:


You're almost there: you just need to use apply instead of sapply, and remove unnecessary columns.

apply(y[-1], 1, function(x) cor(x[1:2], x[3:4])

Of course, the correlation between two length-2 vectors isn't very informative....



来源:https://stackoverflow.com/questions/8845256/calculate-correlation-by-aggregating-columns-of-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!