Dealing with NAs when calculating mean (summarize_each) on group_by

只谈情不闲聊 提交于 2019-11-30 19:35:22

The other answers showed you the syntax for passing mean(., na.rm = TRUE) into summarize/_each.

Personally, I deal with this so often and it's so annoying that I just define the following set of NA-aware basic functions (e.g. in my .Rprofile), such that you can apply them with dplyr with summarize(mean_) and no pesky arg-passing; also keeps the source-code cleaner and more readable, which is another strong plus:

mean_   <- function(...) mean(..., na.rm=T)
median_ <- function(...) median(..., na.rm=T)
sum_    <- function(...) sum(..., na.rm=T)
sd_     <- function(v)   sqrt(sum((v-mean(v))^2) / length(v))
cor_    <- function(...) cor(..., use='pairwise.complete.obs')
table_  <- function(...) table(..., useNA='ifany')
mode_   <- function(...) {
  tab <- table(...)
  names(tab[tab==max(tab)]) # the '==' implicitly excludes NA values
}
clamp_  <- function(..., minval=0, maxval=70) pmax(minval, pmin(maxval,...))

Really you want to be able to flick one global switch once and for all, like na.action/na.pass/na.omit/na.fail to tell functions as default behavior what to do, and not throw errors or be inconsistent, as they currently do, across different packages.

There used to be a CRAN package called Defaults for setting per-function defaults but it is not maintained since 2014, pre-3.x . For more about it Setting Function Defaults R on a Project Specific Basis

try:

 library(dplyr)
 md %>% group_by(device1, device2) %>%
        summarise_each(funs(mean(., na.rm = TRUE)))

Simple as that:

funs(mean(., na.rm = TRUE))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!