Calculate group mean (or other summary stats) and assign to original data

感情迁移 提交于 2019-11-25 21:52:37

问题


I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable (\"value\") within each level of a grouping variable (\"group\").

The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:

Before

id  group  value
1   a      10
2   a      20
3   b      100
4   b      200

After

id  group  value  grp.mean.values
1   a      10     15
2   a      20     15
3   b      100    150
4   b      200    150

回答1:


Have a look at the ave function. Something like

df$grp.mean.values <- ave(df$value, df$group)

If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:

df$grp.min <- ave(df$value, df$group, FUN = min)



回答2:


You may do this in dplyr using mutate:

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(grp.mean.values = mean(value))

...or use data.table to assign the new column by reference (:=):

library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]



回答3:


One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.

require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))

  id group value grp.mean.values
1  1     a    10              15
2  2     a    20              15
3  3     b   100             150
4  4     b   200             150



回答4:


Here is another option using base functions aggregate and merge:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", "mean"))

  group id value.x value.y
1     a  1      10      15
2     a  2      20      15
3     b  3     100     150
4     b  4     200     150

You can get "better" column names with suffixes:

merge(x, aggregate(value ~ group, data = x, mean), 
     by = "group", suffixes = c("", ".mean"))


  group id value value.mean
1     a  1    10         15
2     a  2    20         15
3     b  3   100        150
4     b  4   200        150


来源:https://stackoverflow.com/questions/6053620/calculate-group-mean-or-other-summary-stats-and-assign-to-original-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!