Object not found error with ddply inside a function

后端 未结 5 714
别跟我提以往
别跟我提以往 2020-11-30 01:58

This has really challenged my ability to debug R code.

I want to use ddply() to apply the same functions to different columns that are sequentially nam

相关标签:
5条回答
  • 2020-11-30 02:13

    I occasionally run into problems like this when combining ddply with summarize or transform or something and, not being smart enough to divine the ins and outs of navigating various environments I tend to side-step the issue by simply not using summarize and instead using my own anonymous function:

    myFunction <- function(x, y){
        NewColName <- "a"
        z <- ddply(x, y, .fun = function(xx,col){
                                 c(Ave = mean(xx[,col],na.rm=TRUE))}, 
                   NewColName)
        return(z)
    }
    
    myFunction(df,sv)
    

    Obviously, there is a cost to doing this stuff 'manually', but it often avoids the headache of dealing with the evaluation issues that come from combining ddply and summarize. That's not to say, of course, that Hadley won't show up with a solution...

    0 讨论(0)
  • 2020-11-30 02:18

    You can do this with a combination of do.call and call to construct the call in an environment where NewColName is still visible:

    myFunction <- function(x,y){
    NewColName <- "a"
    z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
    return(z)
    }
    
    myFunction(d.f,sv)
      b Ave
    1 0 1.5
    2 1 3.5
    
    0 讨论(0)
  • 2020-11-30 02:21

    Today's solution to this question is to make summarize into here(summarize). e.g.

    myFunction <- function(x, y){
        NewColName = "a"
        z = ddply(x, y, here(summarize),
                Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
        )
        return(z)
    }
    

    here(f), added to plyr in Dec 2012, captures the current context.

    0 讨论(0)
  • 2020-11-30 02:21

    The problem lies in the code of the plyr package itself. In the summarize function, there is a line eval(substitute(...),.data,parent.frame()). It is well known that parent.frame() can do pretty funky and unexpected stuff. T

    he solution of @James is a very nice workaround, but if I remember right @Hadley himself said before that the plyr package was not intended to be used within functions.

    Sorry, I was wrong here. It is known though that for the moment, the plyr package gives problems in these situations.

    Hence, I give you a base solution for the problem :

    myFunction <- function(x, y){
        NewColName = "a"
        z = aggregate(x[NewColName],x[y],mean,na.rm=TRUE)
        return(z)
    }
    > myFunction(df,sv)
      b   a
    1 0 1.5
    2 1 3.5
    
    0 讨论(0)
  • 2020-11-30 02:36

    Looks like you have an environment problem. Global assignment fixes the problem, but at the cost of one's soul:

    library(plyr)
    
    a = c(1,2,3,4)
    b = c(0,0,1,1)
    c = c(5,6,7,8)
    d.f = data.frame(a,b,c)
    sv = c("b")
    
    ColName = "a"
    ddply(d.f, sv, summarize,
            Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
    )
    
    myFunction <- function(x, y){
        NewColName <<- "a"
        z = ddply(x, y, summarize,
                Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
        )
        return(z)
    }
    
    myFunction(x=d.f,y=sv)
    

    eval is looking in parent.frame(1). So if you instead define NewColName outside MyFunction it should work:

    rm(NewColName)
    NewColName <- "a"
    myFunction <- function(x, y){
    
        z = ddply(x, y, summarize,
                Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
        )
        return(z)
    }
    myFunction(x=d.f,y=sv)
    

    By using get to pull out my.parse from the earlier environment, we can come much closer, but still have to pass curenv as a global:

    myFunction <- function(x, y){
        NewColName <- "a"
        my.parse <- parse(text=NewColName)
        print(my.parse)
        curenv <<- environment()
        print(curenv)
    
        z = ddply(x, y, summarize,
                Ave = mean( eval( get("my.parse" , envir=curenv ) ), na.rm=TRUE)
        )
        return(z)
    }
    
    > myFunction(x=d.f,y=sv)
    expression(a)
    <environment: 0x0275a9b4>
      b Ave
    1 0 1.5
    2 1 3.5
    

    I suspect that ddply is evaluating in the .GlobalEnv already, which is why all of the parent.frame() and sys.frame() strategies I tried failed.

    0 讨论(0)
提交回复
热议问题