Collapse data frame by group using different functions on each variable

不羁岁月 提交于 2019-12-24 10:28:00

问题


Define

df<-read.table(textConnection('egg 1 20 a
                        egg 2 30 a
                        jap 3 50 b
                        jap 1 60 b'))

s.t.

> df
   V1 V2 V3 V4
1 egg  1 20  a
2 egg  2 30  a
3 jap  3 50  b
4 jap  1 60  b

My data has no factors so I convert factors to characters:

> df$V1 <- as.character(df$V1)
> df$V4 <- as.character(df$V4)  

I would like to "collapse" the data frame by V1 keeping:

  • The max of V2
  • The mean of V3
  • The mode of V4 (this value does not actually change within V1 groups, so first, last, etc might do also.)

Please note this is a general question, e.g. my dataset is much larger and I may want to use different functions (e.g. last, first, min, max, variance, st. dev., etc for different variables) when collapsing. Hence the functions argument could be quite long.

In this case I would want output of the form:

> df.collapse
   V1 V2 V3 V4
1 egg  2 25  a
2 jap  3 55  b

回答1:


plyr package will help you:

library(plyr)
ddply(df, .(V1), summarize, V2 = max(V2), V3 = mean(V3), V4 = toupper(V4)[1])

As R does not have mode function (probably), I put other function. But it is easy to implement a mode function.




回答2:


I would suggest using ddply from plyr:

require(plyr)
ddply(df, .(V1), summarise, V2=max(V2), V3=mean(V3), V4=V4[1])

You can replace the functions with any calculation you wish. Your V3 column is non-numeric so might want to convert that to a numeric and then compute the mode. For now I am just returning the V3 value of the first row for each of the splits. Or if you don't want to use plyr:

do.call(rbind, lapply(split(df, df$V1), function(x) {
    data.frame(V2=max(x$V2), V3=mean(x$V3), V4=x$V4[1]))
})


来源:https://stackoverflow.com/questions/6510390/collapse-data-frame-by-group-using-different-functions-on-each-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!