Sub-function in grouping function using dplyr

梦想与她 提交于 2019-12-25 08:48:18

问题


I'm using the dpylr package to count missing values for subgroups for each of my variables.

I used a mini-function:

NAobs <- function(x) length(x[is.na(x)]) ####function to count missing data for variables

to count missing values. Because I have quite some variables and I wanted to add a bit more information (sample size per group, and percentage of missing data per group) I wrote the following code, and inserted one variable (task_1) to check it.

library(dplyr)
group_by(DataRT, class) %>%
  summarise(class_size=length(class), missing = NAobs(task_1), perc.= missing/class_size)

This works very well and I receive a table like this:

   class class_size missing      perc.
   (dbl)      (int)   (int)      (dbl)
1      1         25       2 0.08000000
2      2         25       1 0.04000000
3      3         25       3 0.12000000
4      4         25       4 0.16000000
5      5         24       3 0.12500000
6      6         29       6 0.20689655
...

In the next step, I wanted to generalize my command by including it into a function:

missing<-function(x, print=TRUE){
            group_by(DataRT, class) %>%
                    summarise(class_size=length(class), 
                        missing = NAobs(x),
                        perc.= missing/class_size)}

Optimally, I now could write missing(task_1) and would get the same table, but instead NAobs(x) ignores the grouping variable (class) and I receive a table like this:

   class class_size missing    perc.
   (dbl)      (int)   (int)    (dbl)
1      1         25      59 2.360000
2      2         25      59 2.360000
3      3         25      59 2.360000
4      4         25      59 2.360000
5      5         24      59 2.458333
6      6         29      59 2.034483
...

So what happens is that the column "missing" only shows the total number of NA cases for task_1, ignoring the groups; and replacing NAobs(x) with NAobs(variable name) to fix this issue would ruin the purpose of writing a function in the first place. How could I calculate the number of missing cases per group without having to copy the code and changing the variable name each time? Thank you!


回答1:


New dplyr update. The newest dplyr will be able to solve this with two new functions enquo and !!. The first quotes the input like substitute would, the second unquotes it for evaluation. For more on programming with dplyr, see this vignette

You will need the developer's version of dplyr, and I would also suggest the newest rlang install

#install developer's version until new release in May
library(dplyr) #0.5.0.9004+

#Setup
set.seed(143)
NAobs <- function(x) length(x[is.na(x)])
DataRT <- data.frame(class = sample(1:6, 25, TRUE), task1 = sample(c(NA,1), 25, TRUE),
                     task2 = sample(c(NA,1), 25, TRUE))
f <- function(x) {
  my_var <- enquo(x)
  group_by(DataRT, class) %>%
    summarise(class_size=length(class), 
    missing = NAobs(!!my_var),
    perc.= missing/class_size)
}
f(task1)
# # A tibble: 6 × 4
#   class class_size missing     perc.
#   <int>      <int>   <int>     <dbl>
# 1     1          5       0 0.0000000
# 2     2          4       2 0.5000000
# 3     3          3       0 0.0000000
# 4     4          1       0 0.0000000
# 5     5          5       3 0.6000000
# 6     6          7       3 0.4285714


来源:https://stackoverflow.com/questions/36024920/sub-function-in-grouping-function-using-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!