Creating a “crosstabs” style output

泄露秘密 提交于 2020-01-25 08:25:06

问题


I have a sample dataset:

id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)

df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()

My output I'm trying to achieve is as such:

+-------------+-------+----+----+----+----+
| Column Name | Value | A  | B  | C  | D  |
+-------------+-------+----+----+----+----+
| Gender      | F     | 15 | 11 | 17 | 10 |
| Gender      | M     |  9 | 17 | 14 |  7 |
| Age         | 18    |  4 |  6 |  5 |  4 |
| Age         | 19    |  3 |  6 |  4 |  3 |
| Age         | 20    |  5 |  6 |  7 |  3 |
| Age         | 21    |  7 |  7 |  5 |  4 |
| Age         | 22    |  5 |  3 | 10 |  3 |
| Ethnicity   | A     |  1 |  9 |  9 |  6 |
| Ethnicity   | B     |  7 |  8 |  5 |  2 |
| Ethnicity   | H     |  4 |  4 |  5 |  2 |
| Ethnicity   | O     |  6 |  4 |  5 |  4 |
| Ethnicity   | W     |  6 |  3 |  7 |  3 |
+-------------+-------+----+----+----+----+

So I'm not trying to create a row that say, combines the three categorical variables (Ex: "Hispanic Females Age 22 got 2 A's, 0 B's, 2 C's, etc..." I just want it broken out by the grade distribution by each gender, age, and ethnicity, but they're all in one column.

What's the best way to accomplish this?


回答1:


Using dplyr and tidyr we can get the data in long format, count occurrences of each value for each grade and get the data back in wide format.

library(dplyr)
library(tidyr)

df %>%
 select(-id) %>%
 pivot_longer(cols = -grade) %>%
 count(value, grade) %>%
 pivot_wider(names_from = grade, values_from = n)


# A tibble: 12 x 5
#   value     A     B     C     D
#   <fct> <int> <int> <int> <int>
# 1 F         8    10    12    13
# 2 M        13    18    11    15
# 3 18        2     4     7     6
# 4 19        5     6     4     4
# 5 20        3     6     3     8
# 6 21        6     5     5     3
# 7 22        5     7     4     7
# 8 A         5     3     1     5
# 9 B         5     5     6     7
#10 H         1     4     3     3
#11 O         3    10     7     7
#12 W         7     6     6     6

data

set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()



回答2:


We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df[, -1]), id.var = 'grade'), value ~ grade, length)
#    value  A  B  C  D
# 1:    18  2  4  7  6
# 2:    19  5  6  4  4
# 3:    20  3  6  3  8
# 4:    21  6  5  5  3
# 5:    22  5  7  4  7
# 6:     A  5  3  1  5
# 7:     B  5  5  6  7
# 8:     F  8 10 12 13
# 9:     H  1  4  3  3
#10:     M 13 18 11 15
#11:     O  3 10  7  7
#12:     W  7  6  6  6

data

set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- data.frame(id, gender, age, ethnicity, grade)


来源:https://stackoverflow.com/questions/59621553/creating-a-crosstabs-style-output

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!