R: What are the best functions to deal with concatenating and averaging values in a data.frame?

走远了吗. 提交于 2019-11-30 23:33:10

The plyr package is popular for this, but the base functions by() and aggregate() will also help.

> ddply(my_df, "read_time", function(X) data.frame(OD=mean(X$OD),stdev=sd(X$OD)))
   read_time      OD   stdev
1 2010-02-15 0.15000 0.07071
2 2010-02-16 0.23333 0.15275
3 2010-02-17 0.50000      NA

You can add the missing bit to return 0 instead of NA for the last std.dev.

Also, you don't need the quotes (on the variables) you had in the data.frame construction.

You can try the package data.table. If you know MySQL it should be very easy for you to get all the functions, otherwise the basics are good enough too ;-)

my_dfdt<-data.table(my_df)
mean<-my_dfdt[,mean(OD), by="read_time"]
sd<-  ..  

you can also join both in one line or to cbind at the end, your call of style

Another advantage: it is extremely fast, if you have large samples. Very fast...see documentation why.

This illustrates how you could use aggregate to get the mean and standard deviation by your read_time.

>aggregate(my_df$OD, by=list(my_df$read_time), function(x) mean(x))

     Group.1         x
1 2010-02-15 0.1500000
2 2010-02-16 0.2333333
3 2010-02-17 0.5000000


>aggregate(my_df$OD, by=list(my_df$read_time), function(x) sd(x))
     Group.1          x
1 2010-02-15 0.07071068
2 2010-02-16 0.15275252
3 2010-02-17         NA
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!