问题
I want to average the values by their IDs but not all ID's have the same number of values. How do I do this in R?
I have two columns ID and Value
ID Value
1000 0.51
1000 0.01
1001 0.81
1001 0.41
1001 0.62
1002 0.98
1002 0.12
1002 0.15
1003 0.12
... ...
回答1:
You can try by()
:
> with(df, by(Value, ID, mean))
# ID: 1000
# [1] 0.26
# ------------------------------------------------------------
# ID: 1001
# [1] 0.6133333
# ------------------------------------------------------------
# ID: 1002
# [1] 0.4166667
# ------------------------------------------------------------
# ID: 1003
# [1] 0.12
or aggregate()
:
> aggregate( Value ~ ID, df, mean)
# ID Value
# 1 1000 0.2600000
# 2 1001 0.6133333
# 3 1002 0.4166667
# 4 1003 0.1200000
or using data.table
(if you need fast calculation on large data sets):
> library(data.table)
> setDT(df)[, mean(Value), by = ID]
# ID V1
# 1: 1000 0.2600000
# 2: 1001 0.6133333
# 3: 1002 0.4166667
# 4: 1003 0.1200000
data
df <- structure(list(ID = c(1000L, 1000L, 1001L, 1001L, 1001L, 1002L,
1002L, 1002L, 1003L), Value = c(0.51, 0.01, 0.81, 0.41, 0.62,
0.98, 0.12, 0.15, 0.12)), .Names = c("ID", "Value"),
class = "data.frame", row.names = c(NA, -9L))
回答2:
You could use the package dplyr
and the function summarise_each
:
df=data.frame(ID=c(1000,1000,1001,1001,1001,1002,1002,1002,1003), Value=c(0.51,0.01,0.81,0.41,0.62,0.98,0.12,0.15,0.12))
library(dplyr)
newdf <- df %>% group_by(ID) %>% summarise_each(funs(mean))
which gives you:
ID Value
(dbl) (dbl)
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000
If you deal with large datasets this should be the most efficient way of doing this task.
回答3:
Using sqldf
:
library(sqldf)
sqldf("SELECT ID, avg(Value) Mean
FROM df
GROUP BY ID")
Output:
ID Mean
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000
With dplyr
, instead of summarise_each
as Cleb pointed out, we can just use summarise
:
df %>% group_by(ID) %>% summarise(mean = mean(Value))
#or
summarise(group_by(df, ID), mean = mean(Value))
Output:
ID mean
(int) (dbl)
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000
来源:https://stackoverflow.com/questions/32487367/how-to-average-columns-based-on-id-in-r