How do I pass variables to a custom function in ddply?

狂风中的少年 提交于 2019-12-05 02:34:43

问题


Consider the following data:

d = data.frame(
    experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")),
    si = runif(5),
    ti = runif(5)
)

I would like to perform a correlation test for si and ti, for each experiment factor level. So I thought I'd run:

ddply(d, .(experiment), cor.test)

But how do I pass the values of si and ti to the cor.test call? I tried this:

> ddply(d, .(experiment), cor.test, x = si, y = ti)
Error in .fun(piece, ...) : object 'si' not found
> ddply(d, .(experiment), cor.test, si, ti)
Error in match.arg(alternative) : 
  'arg' must be NULL or a character vector

Is there anything obvious I'm missing? The plyr documentation does not include any example for me. Most commands I see only involve summarize as the function call, but doing the usual things I was used to doing from summarize don't work, as can be seen above.


回答1:


ddply splits your data frame by the variables you select (experiment here) and then passes the function the resulting subsets of the data frame. In your case your function cor.test doesn't accept a data frame as an input, so you need a translation layer:

d <- data.frame(
  experiment = as.factor(c("foo", "foo", "foo", "bar", "bar", "bar")),
  si = runif(6),
  ti = runif(6)
)
ddply(d, .(experiment), function(d.sub) cor.test(d.sub$si, d.sub$ti)$statistic)
#   experiment         t
# 1        bar 0.1517205
# 2        foo 0.3387682

Also, your output has to be something like a vector or a data frame, which is why I just chose $statistic above, but you could have added multiple variables if you wanted.

Side note, I had to add a value to the input data frame as it cor.test won't run on 2 values (was the case for "bar"). If you want more comprehensive stats, you can try:

ddply(d, .(experiment), function(d.sub) {
  as.data.frame(cor.test(d.sub$si, d.sub$ti)[c("statistic", "parameter", "p.value", "estimate")])
} )
#   experiment statistic parameter   p.value  estimate
# 1        bar 0.1517205         1 0.9041428 0.1500039
# 2        foo 0.3387682         1 0.7920584 0.3208567 

Note that since we're now returning something more complex than just a vector, we need to coerce it to a data.frame. If you want to include more complex values (e.g. the confidence interval, which is a two value result), you would have to simplify them first.




回答2:


You can use summarize for this if you don't mind running cor.test several times for each experiment (i.e., performance isn't an issue).

#note that you need at least 3 value pairs for cor.test
set.seed(42)
d = data.frame(
  experiment = as.factor(c("foo", "foo", "foo", "bar", "bar", "bar")),
  si = runif(6),
  ti = runif(6)
)

library(plyr)
ddply(d, .(experiment), summarize,
      r=cor.test(si, ti)$estimate,
      p=cor.test(si, ti)$p.value
      )

#  experiment           r         p
#1        bar  0.07401492 0.9528375
#2        foo -0.41842834 0.7251622


来源:https://stackoverflow.com/questions/20845409/how-do-i-pass-variables-to-a-custom-function-in-ddply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!