问题
To obtain absolute deviation from the mean for two groups of scores, I usually need to write long codes in R such as the ones shown below.
Question
I was wondering if it might be possible in BASE R to somehow Vectorize
the mad()
function so that the absolute deviation from the mean scores for each group of scores in the example I'm showing below could be obtained using that Vectorized
version of mad()
? Any other workable ideas are highly appreciated?
set.seed(0)
y = as.vector(unlist(mapply(FUN = rnorm, n = c(10, 10)))) # Produces two sets of scores
groups = factor( rep(1:2, times = c(10, 10) ) ) # Grouping ID variable
G1 = y[groups == 1] # subset y scores for group 1
G2 = y[groups == 2] # subset y scores for group 2
G1.abs.dev = abs(G1 - mean(G1)) # absolute deviation from mean scores for group 1
G2.abs.dev = abs(G2 - mean(G2)) # absolute deviation from mean scores for group 2
回答1:
How about
score <- lapply(split(y, groups), FUN = function (u) abs(u - mean(u)))
or
score <- ave(y, groups, FUN = function (u) abs(u - mean(u)))
The results are organized in a different way. Choose the one that is most comfortable to you.
There is something wrong with your wording. mad
returns a single statistic / value for data. For example,
sapply(split(y, groups), mad)
You are not vectorizing mad
, but simply computing the deviation for each datum as your example code shows.
回答2:
If you stick everything in a data.frame, it's much cleaner. In base R,
set.seed(0)
df <- data.frame(y = rnorm(20),
group = rep(1:2, each = 10))
df$abs_dev <- with(df, ave(y, group, FUN = function(x){abs(mean(x) - x)}))
df
#> y group abs_dev
#> 1 1.262954285 1 0.90403032
#> 2 -0.326233361 1 0.68515732
#> 3 1.329799263 1 0.97087530
#> 4 1.272429321 1 0.91350536
#> 5 0.414641434 1 0.05571747
#> 6 -1.539950042 1 1.89887401
#> 7 -0.928567035 1 1.28749100
#> 8 -0.294720447 1 0.65364441
#> 9 -0.005767173 1 0.36469114
#> 10 2.404653389 1 2.04572943
#> 11 0.763593461 2 1.12607477
#> 12 -0.799009249 2 0.43652794
#> 13 -1.147657009 2 0.78517570
#> 14 -0.289461574 2 0.07301974
#> 15 -0.299215118 2 0.06326619
#> 16 -0.411510833 2 0.04902952
#> 17 0.252223448 2 0.61470476
#> 18 -0.891921127 2 0.52943981
#> 19 0.435683299 2 0.79816461
#> 20 -1.237538422 2 0.87505711
or dplyr,
library(dplyr)
set.seed(0)
df <- data_frame(y = rnorm(20),
group = rep(1:2, each = 10))
df <- df %>% group_by(group) %>% mutate(abs_dev = abs(mean(y) - y))
df
#> # A tibble: 20 x 3
#> # Groups: group [2]
#> y group abs_dev
#> <dbl> <int> <dbl>
#> 1 1.262954285 1 0.90403032
#> 2 -0.326233361 1 0.68515732
#> 3 1.329799263 1 0.97087530
#> 4 1.272429321 1 0.91350536
#> 5 0.414641434 1 0.05571747
#> 6 -1.539950042 1 1.89887401
#> 7 -0.928567035 1 1.28749100
#> 8 -0.294720447 1 0.65364441
#> 9 -0.005767173 1 0.36469114
#> 10 2.404653389 1 2.04572943
#> 11 0.763593461 2 1.12607477
#> 12 -0.799009249 2 0.43652794
#> 13 -1.147657009 2 0.78517570
#> 14 -0.289461574 2 0.07301974
#> 15 -0.299215118 2 0.06326619
#> 16 -0.411510833 2 0.04902952
#> 17 0.252223448 2 0.61470476
#> 18 -0.891921127 2 0.52943981
#> 19 0.435683299 2 0.79816461
#> 20 -1.237538422 2 0.87505711
or data.table:
library(data.table)
set.seed(0)
dt <- data.table(y = rnorm(20),
group = rep(1:2, each = 10))
dt[, abs_dev := abs(mean(y) - y), by = group][]
#> y group abs_dev
#> 1: 1.262954285 1 0.90403032
#> 2: -0.326233361 1 0.68515732
#> 3: 1.329799263 1 0.97087530
#> 4: 1.272429321 1 0.91350536
#> 5: 0.414641434 1 0.05571747
#> 6: -1.539950042 1 1.89887401
#> 7: -0.928567035 1 1.28749100
#> 8: -0.294720447 1 0.65364441
#> 9: -0.005767173 1 0.36469114
#> 10: 2.404653389 1 2.04572943
#> 11: 0.763593461 2 1.12607477
#> 12: -0.799009249 2 0.43652794
#> 13: -1.147657009 2 0.78517570
#> 14: -0.289461574 2 0.07301974
#> 15: -0.299215118 2 0.06326619
#> 16: -0.411510833 2 0.04902952
#> 17: 0.252223448 2 0.61470476
#> 18: -0.891921127 2 0.52943981
#> 19: 0.435683299 2 0.79816461
#> 20: -1.237538422 2 0.87505711
来源:https://stackoverflow.com/questions/44738753/obtaining-absolute-deviation-from-mean-for-two-sets-of-scores