subset inside a function by the variables specified in ddply

折月煮酒 提交于 2019-12-04 01:38:51

问题


Often I need to subset a data.frame inside a function by the variables that I am subsetting another data.frame to which I apply ddply. To do that I explicitly write again the variables inside the function and I wonder whether there is a more elegant way to do that. Below I include a trivial example just to show which is my current approach to do this.

d1<-expand.grid(x=c('a','b'),y=c('c','d'),z=1:3)
d2<-expand.grid(x=c('a','b'),y=c('c','d'),z=4:6)

results<-ddply(d1,.(x,y),function(d) {
   d2Sub<-subset(d2,x==unique(d$x) & y==unique(d$y))
   out<-d$z+d2Sub$z
   data.frame(out)
 })

回答1:


The plyr package offers functions to make the whole split/apply/combine construct easy. To my knowledge, however, you can only split one thing: a list, a data.frame, an array.

In your case, what you are trying to do is split two objects, then mapply (or Map), then recombine. Since plyr does not have a ready solution for this more complicated construct, you could do it in base R. That's how I assume people were doing things before plyr came out:

# split
d1.split <- split(d1, list(d1$x, d1$y))
d2.split <- split(d2, list(d2$x, d2$y))

# apply
res.split <- Map(function(df1, df2) data.frame(x = df1$x, y = df1$y,
                                               out = df1$z + df2$z),
                 d1.split, d2.split, USE.NAMES = FALSE)

#  combine
res <- do.call(rbind, res.split)

Up to you to decide if it is more elegant or not than you current approach. The assignments I made were to help comprehension, but you can write the whole thing as a single res <- do.call(rbind, Map(FUN, split(d1, ...), split(d2, ...), ...)) statement if you prefer.



来源:https://stackoverflow.com/questions/20171053/subset-inside-a-function-by-the-variables-specified-in-ddply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!