问题
I have a sample dataset:
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()
My output I'm trying to achieve is as such:
+-------------+-------+----+----+----+----+
| Column Name | Value | A | B | C | D |
+-------------+-------+----+----+----+----+
| Gender | F | 15 | 11 | 17 | 10 |
| Gender | M | 9 | 17 | 14 | 7 |
| Age | 18 | 4 | 6 | 5 | 4 |
| Age | 19 | 3 | 6 | 4 | 3 |
| Age | 20 | 5 | 6 | 7 | 3 |
| Age | 21 | 7 | 7 | 5 | 4 |
| Age | 22 | 5 | 3 | 10 | 3 |
| Ethnicity | A | 1 | 9 | 9 | 6 |
| Ethnicity | B | 7 | 8 | 5 | 2 |
| Ethnicity | H | 4 | 4 | 5 | 2 |
| Ethnicity | O | 6 | 4 | 5 | 4 |
| Ethnicity | W | 6 | 3 | 7 | 3 |
+-------------+-------+----+----+----+----+
So I'm not trying to create a row that say, combines the three categorical variables (Ex: "Hispanic Females Age 22 got 2 A's, 0 B's, 2 C's, etc..." I just want it broken out by the grade distribution by each gender, age, and ethnicity, but they're all in one column.
What's the best way to accomplish this?
回答1:
Using dplyr
and tidyr
we can get the data in long format, count
occurrences of each value
for each grade
and get the data back in wide format.
library(dplyr)
library(tidyr)
df %>%
select(-id) %>%
pivot_longer(cols = -grade) %>%
count(value, grade) %>%
pivot_wider(names_from = grade, values_from = n)
# A tibble: 12 x 5
# value A B C D
# <fct> <int> <int> <int> <int>
# 1 F 8 10 12 13
# 2 M 13 18 11 15
# 3 18 2 4 7 6
# 4 19 5 6 4 4
# 5 20 3 6 3 8
# 6 21 6 5 5 3
# 7 22 5 7 4 7
# 8 A 5 3 1 5
# 9 B 5 5 6 7
#10 H 1 4 3 3
#11 O 3 10 7 7
#12 W 7 6 6 6
data
set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- cbind(id,gender,age,ethnicity,grade) %>% as.data.frame()
回答2:
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df[, -1]), id.var = 'grade'), value ~ grade, length)
# value A B C D
# 1: 18 2 4 7 6
# 2: 19 5 6 4 4
# 3: 20 3 6 3 8
# 4: 21 6 5 5 3
# 5: 22 5 7 4 7
# 6: A 5 3 1 5
# 7: B 5 5 6 7
# 8: F 8 10 12 13
# 9: H 1 4 3 3
#10: M 13 18 11 15
#11: O 3 10 7 7
#12: W 7 6 6 6
data
set.seed(123)
id <- 1:100
gender <- sample(c('M','F'), 100, replace=TRUE)
age <- sample(18:22, 100, replace=TRUE)
ethnicity <- sample(c('W','B','H','A','O'), 100, replace = TRUE)
grade <- sample(LETTERS[1:4], 100, replace=TRUE)
df <- data.frame(id, gender, age, ethnicity, grade)
来源:https://stackoverflow.com/questions/59621553/creating-a-crosstabs-style-output