R: What are the best functions to deal with concatenating and averaging values in a data.frame?

前端 未结 3 786
时光取名叫无心
时光取名叫无心 2021-01-06 19:59

I have a data.frame from this code:

   my_df = data.frame(\"read_time\" = c(\"2010-02-15\", \"2010-02-15\", 
                                      \"2010-02-         


        
相关标签:
3条回答
  • 2021-01-06 20:19

    The plyr package is popular for this, but the base functions by() and aggregate() will also help.

    > ddply(my_df, "read_time", function(X) data.frame(OD=mean(X$OD),stdev=sd(X$OD)))
       read_time      OD   stdev
    1 2010-02-15 0.15000 0.07071
    2 2010-02-16 0.23333 0.15275
    3 2010-02-17 0.50000      NA
    

    You can add the missing bit to return 0 instead of NA for the last std.dev.

    Also, you don't need the quotes (on the variables) you had in the data.frame construction.

    0 讨论(0)
  • 2021-01-06 20:28

    This illustrates how you could use aggregate to get the mean and standard deviation by your read_time.

    >aggregate(my_df$OD, by=list(my_df$read_time), function(x) mean(x))
    
         Group.1         x
    1 2010-02-15 0.1500000
    2 2010-02-16 0.2333333
    3 2010-02-17 0.5000000
    
    
    >aggregate(my_df$OD, by=list(my_df$read_time), function(x) sd(x))
         Group.1          x
    1 2010-02-15 0.07071068
    2 2010-02-16 0.15275252
    3 2010-02-17         NA
    
    0 讨论(0)
  • 2021-01-06 20:34

    You can try the package data.table. If you know MySQL it should be very easy for you to get all the functions, otherwise the basics are good enough too ;-)

    my_dfdt<-data.table(my_df)
    mean<-my_dfdt[,mean(OD), by="read_time"]
    sd<-  ..  
    

    you can also join both in one line or to cbind at the end, your call of style

    Another advantage: it is extremely fast, if you have large samples. Very fast...see documentation why.

    0 讨论(0)
提交回复
热议问题