How to calculate cumulative proportion of Likert-type responses in r?

问题

I am trying to provide a basic summary of my Likert-type (5-point) survey results. I know how we can use basic mathematical functions across subgroups using the aggregate function. For instance, I can produce means of each item across subgroups but I do not know how to get a percentage of occurrence of more than 2 possible responses across all items.

I have always used SPSS to aggregate the proportion of positive responses (say 4 and 5) for each item across subgroups. So, as a result, I received percentages of positive responses (favorability) for each item broken down by subgroups.

### What I can produce
aggregate(dataset[items], by=subgroup, FUN=mean)
### What I am trying to produce
aggregate(datase[items], by=subgroup, FUN=[proportion of 4 and 5 choices on each item])

回答1:

Consider a combination of aggregate and ave (the inline aggregation function which keeps same number of rows as input). Specifically, count the likert values per subgroup with aggregate (using formula style for easier read with cbind to rename column), then calculate the ratio of each count by entire subgroup count for proportion percentage with ave.

agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)

agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))

agg_df

To demonstrate with random, seeded data (to be replaced with OP's data). Below assumes likert in long format but can be reshaped from wide:

Data

set.seed(8302019)
dataset <- data.frame(
  subgroup = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
  likert = sample(1:5, 500, replace=TRUE),
  some_num_col = 1
)

head(dataset, 20)
#    subgroup likert some_num_col
# 1     julia      5            1
# 2    python      1            1
# 3      spss      5            1
# 4       sas      1            1
# 5       sas      4            1
# 6      spss      2            1
# 7         r      5            1
# 8         r      5            1
# 9         r      1            1
# 10     spss      3            1
# 11     spss      4            1
# 12      sas      3            1
# 13     spss      5            1
# 14     spss      1            1
# 15     spss      2            1
# 16      sas      4            1
# 17        r      2            1
# 18      sas      4            1
# 19      sas      4            1
# 20     spss      1            1

Proportion by Subgroup

agg_df <- aggregate(cbind(count=some_num_col) ~ likert + subgroup, dataset, FUN=length)

agg_df$prop <- with(agg_df, count / ave(count, subgroup, FUN=sum))

agg_df
#    likert subgroup count      prop
# 1       1    julia    21 0.2359551
# 2       2    julia    16 0.1797753
# 3       3    julia    18 0.2022472
# 4       4    julia    17 0.1910112
# 5       5    julia    17 0.1910112
# 6       1   python    14 0.1891892
# 7       2   python    16 0.2162162
# 8       3   python    16 0.2162162
# 9       4   python    16 0.2162162
# 10      5   python    12 0.1621622
# 11      1        r    20 0.2061856
# 12      2        r    19 0.1958763
# 13      3        r    26 0.2680412
# 14      4        r    17 0.1752577
# 15      5        r    15 0.1546392
# 16      1      sas    18 0.1956522
# 17      2      sas    16 0.1739130
# 18      3      sas    24 0.2608696
# 19      4      sas    18 0.1956522
# 20      5      sas    16 0.1739130
# 21      1     spss    13 0.1688312
# 22      2     spss    22 0.2857143
# 23      3     spss    15 0.1948052
# 24      4     spss    16 0.2077922
# 25      5     spss    11 0.1428571
# 26      1    stata    17 0.2394366
# 27      2    stata     8 0.1126761
# 28      3    stata    16 0.2253521
# 29      4    stata    12 0.1690141
# 30      5    stata    18 0.2535211

来源：https://stackoverflow.com/questions/57728697/how-to-calculate-cumulative-proportion-of-likert-type-responses-in-r

标签

aggregate