Object not found error with ddply inside a function

谁都会走 提交于 2019-11-27 03:38:36

You can do this with a combination of do.call and call to construct the call in an environment where NewColName is still visible:

myFunction <- function(x,y){
NewColName <- "a"
z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
return(z)
}

myFunction(d.f,sv)
  b Ave
1 0 1.5
2 1 3.5

Today's solution to this question is to make summarize into here(summarize). e.g.

myFunction <- function(x, y){
    NewColName = "a"
    z = ddply(x, y, here(summarize),
            Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
    )
    return(z)
}

here(f), added to plyr in Dec 2012, captures the current context.

I occasionally run into problems like this when combining ddply with summarize or transform or something and, not being smart enough to divine the ins and outs of navigating various environments I tend to side-step the issue by simply not using summarize and instead using my own anonymous function:

myFunction <- function(x, y){
    NewColName <- "a"
    z <- ddply(x, y, .fun = function(xx,col){
                             c(Ave = mean(xx[,col],na.rm=TRUE))}, 
               NewColName)
    return(z)
}

myFunction(df,sv)

Obviously, there is a cost to doing this stuff 'manually', but it often avoids the headache of dealing with the evaluation issues that come from combining ddply and summarize. That's not to say, of course, that Hadley won't show up with a solution...

The problem lies in the code of the plyr package itself. In the summarize function, there is a line eval(substitute(...),.data,parent.frame()). It is well known that parent.frame() can do pretty funky and unexpected stuff. T

he solution of @James is a very nice workaround, but if I remember right @Hadley himself said before that the plyr package was not intended to be used within functions.

Sorry, I was wrong here. It is known though that for the moment, the plyr package gives problems in these situations.

Hence, I give you a base solution for the problem :

myFunction <- function(x, y){
    NewColName = "a"
    z = aggregate(x[NewColName],x[y],mean,na.rm=TRUE)
    return(z)
}
> myFunction(df,sv)
  b   a
1 0 1.5
2 1 3.5

Looks like you have an environment problem. Global assignment fixes the problem, but at the cost of one's soul:

library(plyr)

a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
d.f = data.frame(a,b,c)
sv = c("b")

ColName = "a"
ddply(d.f, sv, summarize,
        Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)

myFunction <- function(x, y){
    NewColName <<- "a"
    z = ddply(x, y, summarize,
            Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
    )
    return(z)
}

myFunction(x=d.f,y=sv)

eval is looking in parent.frame(1). So if you instead define NewColName outside MyFunction it should work:

rm(NewColName)
NewColName <- "a"
myFunction <- function(x, y){

    z = ddply(x, y, summarize,
            Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
    )
    return(z)
}
myFunction(x=d.f,y=sv)

By using get to pull out my.parse from the earlier environment, we can come much closer, but still have to pass curenv as a global:

myFunction <- function(x, y){
    NewColName <- "a"
    my.parse <- parse(text=NewColName)
    print(my.parse)
    curenv <<- environment()
    print(curenv)

    z = ddply(x, y, summarize,
            Ave = mean( eval( get("my.parse" , envir=curenv ) ), na.rm=TRUE)
    )
    return(z)
}

> myFunction(x=d.f,y=sv)
expression(a)
<environment: 0x0275a9b4>
  b Ave
1 0 1.5
2 1 3.5

I suspect that ddply is evaluating in the .GlobalEnv already, which is why all of the parent.frame() and sys.frame() strategies I tried failed.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!