Grouped correlation with dplyr (works only on console)

*爱你&永不变心* 提交于 2019-11-28 08:21:52

问题


I'm trying to use dplyr to calculate grouped correlations, but something is clearly wrong since the code below works only in the console:

require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))
gp = group_by(xx, group)
summarize(gp, cor(a, b))

  group   cor(a, b)
1     1 -0.02073084
2     2  0.12803353
3     3  0.06236264
4     4 -0.06181904

If i use the same code in RStudio, i get:

   cor(a, b)
1 0.02739193

What's happening?


回答1:


What you experience is related to having both plyr and dplyr loaded at the same time. Since both packages have summarize functions, there can be conflicts if you don't specify explicitly which package you want to use. For the example data, this means:

require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))

Using dplyr as intended:

gp = group_by(xx, group)
dplyr::summarize(gp, cor(a, b))
#Source: local data frame [4 x 2]
#
#  group   cor(a, b)
#1     1 -0.02073084
#2     2  0.12803353
#3     3  0.06236264
#4     4 -0.06181904

Or using plyr

gp = group_by(xx, group)
plyr::summarize(gp, cor(a, b))
#   cor(a, b)
#1 0.02739193

So either avoid loading both packages or specify the package by using package::function.



来源:https://stackoverflow.com/questions/25023657/grouped-correlation-with-dplyr-works-only-on-console

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!