How can I, in R calculate the overall variance and the variance for each group from a dataset that looks like this (for example):
Group Count Value
A 3 5
A 2 8
B 1 11
B 3 15
I know to calculate the variance as a whole, ignoring the groups I would do:
var(rep(x$Value, x$Count)),
but how do I automatically calculate the variance for each group accounting for the frequency? E.g., the variance for group A, group B, etc.,.. I would like my output to have the following headers:
Group, Total Count, Group Variance
I have also reviewed this link; R computing mean, median, variance from file with frequency distribution which is different (does not have the group component) so this is not a duplicate.
thank you for all of the help.
One option is using data.table. Convert the data.frame to data.table (setDT) and get the var of "Value" and sum of "Count" by "Group".
library(data.table)
setDT(df1)[, list(GroupVariance=var(rep(Value, Count)),
TotalCount=sum(Count)) , by = Group]
# Group GroupVariance TotalCount
#1: A 2.7 5
#2: B 4.0 4
a similar way using dplyr is
library(dplyr)
group_by(df1, Group) %>%
summarise(GroupVariance=var(rep(Value,Count)), TotalCount=sum(Count))
# Group GroupVariance TotalCount
#1 A 2.7 5
#2 B 4.0 4
Here's a quick wrapper with base R. First step is to grow your data set by Count, and then calculate variance by group
df1 <- df[rep(seq_len(nrow(df)), df$Count), ]
with(df1, tapply(Value, Group, var))
# A B
# 2.7 4.0
Or similarly
aggregate(Value ~ Group, df1, function(x) c(Var = var(x), Count = length(x)))
# Group Value.Var Value.Count
# 1 A 2.7 5.0
# 2 B 4.0 4.0
来源:https://stackoverflow.com/questions/28659723/calculating-grouped-variance-from-a-frequency-table-in-r