问题
I am trying to provide a basic summary of my Likert-type (5-point) survey results. I know how we can use basic mathematical functions across subgroups using the aggregate function. For instance, I can produce means of each item across subgroups but I do not know how to get a percentage of occurrence of more than 2 possible responses across all items.
I have always used SPSS to aggregate the proportion of positive responses (say 4 and 5) for each item across subgroups. So, as a result, I received percentages of positive responses (favorability) for each item broken down by subgroups.
### What I can produce
aggregate(dataset[items], by=subgroup, FUN=mean)
### What I am trying to produce
aggregate(datase[items], by=subgroup, FUN=[proportion of 4 and 5 choices on each item])
回答1:
Consider a combination of aggregate
and ave
(the inline aggregation function which keeps same number of rows as input). Specifically, count the likert values per subgroup with aggregate
(using formula style for easier read with cbind
to rename column), then calculate the ratio of each count by entire subgroup count for proportion percentage with ave
.
agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)
agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))
agg_df
To demonstrate with random, seeded data (to be replaced with OP's data). Below assumes likert in long format but can be reshaped from wide:
Data
set.seed(8302019)
dataset <- data.frame(
subgroup = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
likert = sample(1:5, 500, replace=TRUE),
some_num_col = 1
)
head(dataset, 20)
# subgroup likert some_num_col
# 1 julia 5 1
# 2 python 1 1
# 3 spss 5 1
# 4 sas 1 1
# 5 sas 4 1
# 6 spss 2 1
# 7 r 5 1
# 8 r 5 1
# 9 r 1 1
# 10 spss 3 1
# 11 spss 4 1
# 12 sas 3 1
# 13 spss 5 1
# 14 spss 1 1
# 15 spss 2 1
# 16 sas 4 1
# 17 r 2 1
# 18 sas 4 1
# 19 sas 4 1
# 20 spss 1 1
Proportion by Subgroup
agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)
agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))
agg_df
# likert subgroup count prop
# 1 1 julia 21 0.2359551
# 2 2 julia 16 0.1797753
# 3 3 julia 18 0.2022472
# 4 4 julia 17 0.1910112
# 5 5 julia 17 0.1910112
# 6 1 python 14 0.1891892
# 7 2 python 16 0.2162162
# 8 3 python 16 0.2162162
# 9 4 python 16 0.2162162
# 10 5 python 12 0.1621622
# 11 1 r 20 0.2061856
# 12 2 r 19 0.1958763
# 13 3 r 26 0.2680412
# 14 4 r 17 0.1752577
# 15 5 r 15 0.1546392
# 16 1 sas 18 0.1956522
# 17 2 sas 16 0.1739130
# 18 3 sas 24 0.2608696
# 19 4 sas 18 0.1956522
# 20 5 sas 16 0.1739130
# 21 1 spss 13 0.1688312
# 22 2 spss 22 0.2857143
# 23 3 spss 15 0.1948052
# 24 4 spss 16 0.2077922
# 25 5 spss 11 0.1428571
# 26 1 stata 17 0.2394366
# 27 2 stata 8 0.1126761
# 28 3 stata 16 0.2253521
# 29 4 stata 12 0.1690141
# 30 5 stata 18 0.2535211
来源:https://stackoverflow.com/questions/57728697/how-to-calculate-cumulative-proportion-of-likert-type-responses-in-r