How to get the frequency from grouped data with dplyr?

99封情书 提交于 2019-12-14 03:07:06

问题


This certainly is a basic question but I cannot figure it out by myself. Please consider the following:

In a large dataset with patient characteristics in long format I want to summarise some variables. I would prefer to use dplyr for that.

For the example data set:

db <- data.frame(ID = c(rep(1, 3), rep(2,4), rep(3, 2)),
                  Gender = factor(c(rep("woman", 7), rep("man", 2))),
                  Grade = c(rep(3, 3), rep(1, 4), rep(2, 2)))
db
#    ID Gender Grade
#  1  1 woman     3
#  2  1 woman     3
#  3  1 woman     3
#  4  2 woman     1
#  5  2 woman     1
#  6  2 woman     1
#  7  2 woman     1
#  8  3   man     2
#  9  3   man     2

I would like to make a frequency table for Gender and Grade. Obviously, there are 2 female patients and 1 male. Each grade (1:3) occurs once.

I tried:

x <- db %>% group_by(ID, Gender, Grade)
  table(y$Gender)
x
# A tibble: 9 x 3
# Groups:   ID, Gender, Grade [3]
#     ID Gender Grade
#  <dbl> <fct>  <dbl>
# 1    1. woman     3.
# 2    1. woman     3.
# 3    1. woman     3.
# 4    2. woman     1.
# 5    2. woman     1.
# 6    2. woman     1.
# 7    2. woman     1.
# 8    3. man       2.
# 9    3. man       2.

but when I call for instance table(x$Gender), the outcome is:

table(y$Gender)

#    man woman 
#      2     7 

What am I doing wrong?

Thanks a lot in advance!

Edit: The desired output is to have a frequency table of how many male/female participants there are in the dataset, as well as how many patients have grade 1, 2, 3 etc. Please see below.

With the following I can call the percentage of females in db:

db %>%
summarise(pct.female = mean(Gender == "woman", na.rm = T))
#    pct.female
# 1  0.7777778

What I would rather need is the amount of males/females (n). Something like this:

# man    woman
#   1        2

回答1:


require(dplyr)
db %>% group_by(Gender, Grade) %>% tally()

# A tibble: 3 x 3
# Groups:   Gender [?]
  Gender Grade     n
  <fct>  <dbl> <int>
1 man     2.00     2
2 woman   1.00     4
3 woman   3.00     3

# Was also suggested by @konvas in their comment.

will tell you all the unique combinations of Gender and Grade. And how many each of those exist. This what you want? Difficult to say from your question. Desired output would be good.


edit Alternatively, as per requested output:

db %>% distinct(ID, Gender) %>% count(Gender) 

# A tibble: 2 x 2
  Gender `n()`
  <fct>  <int>
1 man        1
2 woman      2



回答2:


require(dplyr)
require(magrittr)
db %>% count(ID, Gender) %$% table(Gender)

Or, without dplyr

require(magrittr)
db %$% split(Gender, ID) %>% sapply(unique) %>% table


来源:https://stackoverflow.com/questions/49283009/how-to-get-the-frequency-from-grouped-data-with-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!