Creating a “crosstabs” style output

问题

I have a sample dataset:

id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)

df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()

My output I'm trying to achieve is as such:

+-------------+-------+----+----+----+----+
| Column Name | Value | A  | B  | C  | D  |
+-------------+-------+----+----+----+----+
| Gender      | F     | 15 | 11 | 17 | 10 |
| Gender      | M     |  9 | 17 | 14 |  7 |
| Age         | 18    |  4 |  6 |  5 |  4 |
| Age         | 19    |  3 |  6 |  4 |  3 |
| Age         | 20    |  5 |  6 |  7 |  3 |
| Age         | 21    |  7 |  7 |  5 |  4 |
| Age         | 22    |  5 |  3 | 10 |  3 |
| Ethnicity   | A     |  1 |  9 |  9 |  6 |
| Ethnicity   | B     |  7 |  8 |  5 |  2 |
| Ethnicity   | H     |  4 |  4 |  5 |  2 |
| Ethnicity   | O     |  6 |  4 |  5 |  4 |
| Ethnicity   | W     |  6 |  3 |  7 |  3 |
+-------------+-------+----+----+----+----+

So I'm not trying to create a row that say, combines the three categorical variables (Ex: "Hispanic Females Age 22 got 2 A's, 0 B's, 2 C's, etc..." I just want it broken out by the grade distribution by each gender, age, and ethnicity, but they're all in one column.

What's the best way to accomplish this?

回答1:

Using dplyr and tidyr we can get the data in long format, count occurrences of each value for each grade and get the data back in wide format.

library(dplyr)
library(tidyr)

df %>%
 select(-id) %>%
 pivot_longer(cols = -grade) %>%
 count(value, grade) %>%
 pivot_wider(names_from = grade, values_from = n)


# A tibble: 12 x 5
#   value     A     B     C     D
#   <fct> <int> <int> <int> <int>
# 1 F         8    10    12    13
# 2 M        13    18    11    15
# 3 18        2     4     7     6
# 4 19        5     6     4     4
# 5 20        3     6     3     8
# 6 21        6     5     5     3
# 7 22        5     7     4     7
# 8 A         5     3     1     5
# 9 B         5     5     6     7
#10 H         1     4     3     3
#11 O         3    10     7     7
#12 W         7     6     6     6

data

set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()

回答2:

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df[, -1]), id.var = 'grade'), value ~ grade, length)
#    value  A  B  C  D
# 1:    18  2  4  7  6
# 2:    19  5  6  4  4
# 3:    20  3  6  3  8
# 4:    21  6  5  5  3
# 5:    22  5  7  4  7
# 6:     A  5  3  1  5
# 7:     B  5  5  6  7
# 8:     F  8 10 12 13
# 9:     H  1  4  3  3
#10:     M 13 18 11 15
#11:     O  3 10  7  7
#12:     W  7  6  6  6

data

set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- data.frame(id, gender, age, ethnicity, grade)

来源：https://stackoverflow.com/questions/59621553/creating-a-crosstabs-style-output

标签

dplyr

tidyr