How can I use variable names to refer to data frame columns with ddply?

假如想象 提交于 2019-12-03 16:58:56

The arguments to ddply are expressions which are evaluated in the context of the each part the original data frame is split into. Your df[myval] addresses the whole data frame, so you cannot pass it as-is (btw, why do you need those as.numeric(as.character()) stuff - they are completely useless).

The easiest way will be to write your own function which will does everything inside and pass the column name down, e.g.

df <- ddply(df, 
            .(year), 
            .fun = function(x, colname) transform(x, cum_sales = cumsum(x[,colname])), 
            colname = "sales")

The problem is that ddply expects its last arguments to be expressions, that will be evaluated on chunks of the data.frame (every year, in your example). If you use df[myval], you have the whole data.frame, not the annual chunks.

The following works, but is not very elegant: I build the expression as a string, and then convert it with eval(parse(...)).

ddply( df, .(year), transform, 
  cum_value2 = eval(parse( text = 
    sprintf( "cumsum(as.numeric(as.character(%s)))", mycol )
  ))
)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!