summarizing data in cross-table with grouped_by variable in columns

问题

I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:

 dat1 <- data.frame(
    category = rep(c("catA", "catB", "catC"), each=4),
    age = sample(1:2,size=4,replace=T),
    value = rnorm(12)
 )

and then I would usually get my summary dataframe like this:

dat1 %>% group_by(category,age)%>% summarize(mean(value))

which looks like this:

but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:

dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))

There must be a better way than hand-coding means column?

回答1:

You just need to use tidyr in addition to do something like this:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

Output is as follows:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227

来源：https://stackoverflow.com/questions/43877053/summarizing-data-in-cross-table-with-grouped-by-variable-in-columns

标签

dplyr

tidyr