问题
This question already has an answer here:
- Mean per group in a data.frame [duplicate] 8 answers
I have a dataframe recording how much money a costomer spend in detail like the following:
custid, value
1, 1
1, 3
1, 2
1, 5
1, 4
1, 1
2, 1
2, 10
3, 1
3, 2
3, 5
How to calcuate the charicteristics using mean,max,median,std, etc like the following? Use some apply function? And how?
custid, mean, max,min,median,std
1, ....
2,....
3,....
回答1:
To add to the alternatives, here's summaryBy
from the "doBy" package, with which you can specify a list
of functions to apply.
library(doBy)
summaryBy(value ~ custid, data = mydf,
FUN = list(mean, max, min, median, sd))
# custid value.mean value.max value.min value.median value.sd
# 1 1 2.666667 5 1 2.5 1.632993
# 2 2 5.500000 10 1 5.5 6.363961
# 3 3 2.666667 5 1 2.0 2.081666
Of course, you can also stick with base R:
myFun <- function(x) {
c(min = min(x), max = max(x),
mean = mean(x), median = median(x),
std = sd(x))
}
tapply(mydf$value, mydf$custid, myFun)
# $`1`
# min max mean median std
# 1.000000 5.000000 2.666667 2.500000 1.632993
#
# $`2`
# min max mean median std
# 1.000000 10.000000 5.500000 5.500000 6.363961
#
# $`3`
# min max mean median std
# 1.000000 5.000000 2.666667 2.000000 2.081666
cbind(custid = unique(mydf$custid),
do.call(rbind, tapply(mydf$value, mydf$custid, myFun)))
# custid min max mean median std
# 1 1 1 5 2.666667 2.5 1.632993
# 2 2 1 10 5.500000 5.5 6.363961
# 3 3 1 5 2.666667 2.0 2.081666
回答2:
library(dplyr)
dat%>%
group_by(custid)%>%
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))
# custid Mean Max Min Median Std
#1 1 2.666667 5 1 2.5 1.632993
#2 2 5.500000 10 1 5.5 6.363961
#3 3 2.666667 5 1 2.0 2.081666
For bigger datasets, data.table
would be faster
setDT(dat)[,list(Mean=mean(value), Max=max(value), Min=min(value), Median=as.numeric(median(value)), Std=sd(value)), by=custid]
# custid Mean Max Min Median Std
#1: 1 2.666667 5 1 2.5 1.632993
#2: 2 5.500000 10 1 5.5 6.363961
#3: 3 2.666667 5 1 2.0 2.081666
回答3:
If you want to apply a larger number of functions to all or the same column(s) with dplyr
I recommend summarise_each
or mutate_each
:
require(dplyr)
dat %>%
group_by(custid) %>%
summarise_each(funs(max, min, mean, median, sd), value)
#Source: local data frame [3 x 6]
#
# custid max min mean median sd
#1 1 5 1 2.666667 2.5 1.632993
#2 2 10 1 5.500000 5.5 6.363961
#3 3 5 1 2.666667 2.0 2.081666
Or another option with base R's aggregate
:
aggregate(value ~ custid, data = dat, summary)
# custid value.Min. value.1st Qu. value.Median value.Mean value.3rd Qu. value.Max.
#1 1 1.000 1.250 2.500 2.667 3.750 5.000
#2 2 1.000 3.250 5.500 5.500 7.750 10.000
#3 3 1.000 1.500 2.000 2.667 3.500 5.000
(This doesn't include standard deviation but I think it's a nice approach for the other descriptive stats.)
回答4:
I like describeBy()
from the psych
package. Like this
df <- structure(list(custid. = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L,
3L, 3L), value = c(1L, 3L, 2L, 5L, 4L, 1L, 1L, 10L, 1L, 2L, 5L
)), .Names = c("custid.", "value"), class = "data.frame", row.names = c(NA,
-11L))
df
custid. value
1 1 1
2 1 3
3 1 2
4 1 5
5 1 4
6 1 1
7 2 1
8 2 10
9 3 1
10 3 2
11 3 5
# install.packages(c("psych"), dependencies = TRUE)
require(psych)
describeBy(df$value, df$custid.)
group: 1
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 6 2.67 1.63 2.5 2.67 2.22 1 5 4 0.21 -1.86 0.67
-----------------------------------------------------------------------
group: 2
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 2 5.5 6.36 5.5 5.5 6.67 1 10 9 0 -2.75 4.5
-----------------------------------------------------------------------
group: 3
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 3 2.67 2.08 2 2.67 1.48 1 5 4 0.29 -2.33 1.2
Or get it as a matrix if you prefer that,
describeBy(df$value, df$custid., mat=T, skew = F)
item group1 vars n mean sd median min max range se
11 1 1 1 6 2.666667 1.632993 2.5 1 5 4 0.6666667
12 2 2 1 2 5.500000 6.363961 5.5 1 10 9 4.5000000
13 3 3 1 3 2.666667 2.081666 2.0 1 5 4 1.2018504
回答5:
You can use plyr package
Split apply combine strategy
ddply(dataframe, .(groupcol), function)
In your case
ddply(dataframe, .(custid), summarize, "mean"= mean(value), "median" = median(value))
Take a look at the help for ddply you have a good example for you
来源:https://stackoverflow.com/questions/25198442/how-to-calculate-mean-median-per-group-in-a-dataframe-in-r