how to calculate mean/median per group in a dataframe in r [duplicate]

匿名 (未验证) 提交于 2019-12-03 02:49:01

问题:

This question already has an answer here:

I have a dataframe recording how much money a costomer spend in detail like the following:

custid, value 1,  1 1,  3 1,  2 1,  5 1,  4 1,  1 2,  1 2,  10 3,  1 3,  2 3,  5 

How to calcuate the charicteristics using mean,max,median,std, etc like the following? Use some apply function? And how?

custid, mean, max,min,median,std 1,  .... 2,.... 3,.... 

回答1:

To add to the alternatives, here's summaryBy from the "doBy" package, with which you can specify a list of functions to apply.

library(doBy) summaryBy(value ~ custid, data = mydf,            FUN = list(mean, max, min, median, sd)) #   custid value.mean value.max value.min value.median value.sd # 1      1   2.666667         5         1          2.5 1.632993 # 2      2   5.500000        10         1          5.5 6.363961 # 3      3   2.666667         5         1          2.0 2.081666 

Of course, you can also stick with base R:

myFun 


回答2:

library(dplyr) dat%>% group_by(custid)%>%  summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value)) #  custid     Mean Max Min Median      Std #1      1 2.666667   5   1    2.5 1.632993 #2      2 5.500000  10   1    5.5 6.363961 #3      3 2.666667   5   1    2.0 2.081666 

For bigger datasets, data.table would be faster

setDT(dat)[,list(Mean=mean(value), Max=max(value), Min=min(value), Median=as.numeric(median(value)), Std=sd(value)), by=custid] #   custid     Mean Max Min Median      Std #1:      1 2.666667   5   1    2.5 1.632993 #2:      2 5.500000  10   1    5.5 6.363961 #3:      3 2.666667   5   1    2.0 2.081666 


回答3:

If you want to apply a larger number of functions to all or the same column(s) with dplyr I recommend summarise_each or mutate_each:

require(dplyr) dat %>%   group_by(custid) %>%   summarise_each(funs(max, min, mean, median, sd), value) #Source: local data frame [3 x 6] # #  custid max min     mean median       sd #1      1   5   1 2.666667    2.5 1.632993 #2      2  10   1 5.500000    5.5 6.363961 #3      3   5   1 2.666667    2.0 2.081666 

Or another option with base R's aggregate:

aggregate(value ~ custid, data = dat, summary) #  custid value.Min. value.1st Qu. value.Median value.Mean value.3rd Qu. value.Max. #1      1      1.000         1.250        2.500      2.667         3.750      5.000 #2      2      1.000         3.250        5.500      5.500         7.750     10.000 #3      3      1.000         1.500        2.000      2.667         3.500      5.000 

(This doesn't include standard deviation but I think it's a nice approach for the other descriptive stats.)



回答4:

I like describeBy() from the psych package. Like this

df 

Or get it as a matrix if you prefer that,

 describeBy(df$value, df$custid., mat=T, skew = F)    item group1 vars n     mean       sd median min max range        se 11    1      1    1 6 2.666667 1.632993    2.5   1   5     4 0.6666667 12    2      2    1 2 5.500000 6.363961    5.5   1  10     9 4.5000000 13    3      3    1 3 2.666667 2.081666    2.0   1   5     4 1.2018504 


回答5:

You can use plyr package

Split apply combine strategy

ddply(dataframe, .(groupcol), function)

In your case

ddply(dataframe, .(custid), summarize, "mean"= mean(value), "median" = median(value))

Take a look at the help for ddply you have a good example for you



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!