Using Group_by create aggregated counts conditional on value

问题

I have a data table that looks like this:

    serialno state type type2
1       100    FL    A     C
2       100    CA    A     D
3       101    CA    B     D
4       102    GA    A     C
5       103    WA    A     C
6       103    PA    B     C
7       104    CA    B     D
8       104    CA    B     C
9       105    NY    A     D
10      105    NJ    B     C

I need to create a new data table that is aggregated by serialno but calculates the count of each type of existing variables. So the end result would look like this.

    FL  CA  GA  A   B   C   D
100 1   1       2       1   1
101     1   1       1       1
102             1       1   
103 1       1   1   1   2   
104     2           2   1   1
105 1   1       1   1   1   1

I'm sure there is a solution using some combination of group_by and summarize, but I haven't been able to figure this out. Is the easiest solution just to create first spread out the state, type, and type2 columns and then use summarize to create counts?

回答1:

Using:

library(reshape2)
recast(dat, serialno ~ value, id.var = 'serialno', fun.aggregate = length)

gives:

  serialno A B C CA D FL GA NJ NY PA WA
1      100 2 0 1  1 1  1  0  0  0  0  0
2      101 0 1 0  1 1  0  0  0  0  0  0
3      102 1 0 1  0 0  0  1  0  0  0  0
4      103 1 1 2  0 0  0  0  0  0  1  1
5      104 0 2 1  2 1  0  0  0  0  0  0
6      105 1 1 1  0 1  0  0  1  1  0  0

Or:

library(dplyr)
library(tidyr)
dat %>% 
  gather(key, val, state:type2) %>% 
  group_by(serialno, val) %>% 
  tally() %>% 
  spread(val, n, fill = 0)

Or:

library(data.table)
dcast(melt(setDT(dat), id = 1), serialno ~ value, fun.aggregate = length)

Or (per @Frank's suggestion):

res <- table(melt(dat, id="serialno")[, c("serialno","value")])
print(res, zero.print="")

Of which the last one results in:

        value
serialno A B C CA D FL GA NJ NY PA WA
     100 2   1  1 1  1               
     101   1    1 1                  
     102 1   1          1            
     103 1 1 2                   1  1
     104   2 1  2 1                  
     105 1 1 1    1        1  1

来源：https://stackoverflow.com/questions/43262085/using-group-by-create-aggregated-counts-conditional-on-value

标签

dplyr

tidyr