问题
I want to calculate mean
(or any other summary statistics of length one, e.g. min
, max
, length
, sum
) of a numeric variable (\"value\") within each level of a grouping variable (\"group\").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean
:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150
回答1:
Have a look at the ave
function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave
to calculate something else per group, you need to specify FUN = your-desired-function
, e.g. FUN = min
:
df$grp.min <- ave(df$value, df$group, FUN = min)
回答2:
You may do this in dplyr
using mutate
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table
to assign the new column by reference (:=
):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]
回答3:
One option is to use plyr
. ddply
expects a data.frame
(the first d) and returns a data.frame
(the second d). Other XXply functions work in a similar way; i.e. ldply
expects a list
and returns a data.frame
, dlply
does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150
回答4:
Here is another option using base functions aggregate
and merge
:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes
:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
来源:https://stackoverflow.com/questions/6053620/calculate-group-mean-or-other-summary-stats-and-assign-to-original-data