Is it possible to set na.rm to TRUE globally?

≡放荡痞女 提交于 2019-11-27 02:35:50

问题


For commands like max the option na.rm is set by default to FALSE. I understand why this is a good idea in general, but I'd like to turn it off reversibly for a while -- i.e. during a session.

How can I require R to set na.rm = TRUE whenever it is an option? I found

options(na.action = na.omit)

but this doesn't work. I know that I can set a na.rm=TRUE option for each and every function I write.

my.max <- function(x) {max(x, na.rm=TRUE)}

But that's not what I am looking for. I'm wondering if there's something I could do more globally/universally instead of doing it for each function.


回答1:


One workaround (dangerous), is to do the following :

  1. List all functions that have na.rm as argument. Here I limited my search to the base package.
  2. Fetch each function and add this line at the beginning of its body: na.rm = TRUE
  3. Assign the function back to the base package.

So first I store in a list (ll) all functions having na.rm as argument:

uses_arg <- function(x,arg) 
  is.function(fx <- get(x)) && 
  arg %in% names(formals(fx))
basevals <- ls(pos="package:base")      
na.rm.f <- basevals[sapply(basevals,uses_arg,'na.rm')]

EDIT better method to get all na.rm's argument functions (thanks to mnel comment)

Funs <- Filter(is.function,sapply(ls(baseenv()),get,baseenv()))
na.rm.f <- names(Filter(function(x) any(names(formals(args(x)))%in% 'na.rm'),Funs))

So na.rm.f list looks like:

 [1] "all"                     "any"                     "colMeans"                "colSums"                
 [5] "is.unsorted"             "max"                     "mean.default"            "min"                    
 [9] "pmax"                    "pmax.int"                "pmin"                    "pmin.int"               
[13] "prod"                    "range"                   "range.default"           "rowMeans"               
[17] "rowsum.data.frame"       "rowsum.default"          "rowSums"                 "sum"                    
[21] "Summary.data.frame"      "Summary.Date"            "Summary.difftime"        "Summary.factor"         
[25] "Summary.numeric_version" "Summary.ordered"         "Summary.POSIXct"         "Summary.POSIXlt" 

Then for each function I change the body, the code is inspired from data.table package (FAQ 2.23) that add one line to the start of rbind.data.frame and cbind.data.frame.

ll <- lapply(na.rm.f,function(x)
  {
  tt <- get(x)
  ss = body(tt)
  if (class(ss)!="{") ss = as.call(c(as.name("{"), ss))
  if(length(ss) < 2) print(x)
  else{
    if (!length(grep("na.rm = TRUE",ss[[2]],fixed=TRUE))) {
      ss = ss[c(1,NA,2:length(ss))]
      ss[[2]] = parse(text="na.rm = TRUE")[[1]]
      body(tt)=ss
      (unlockBinding)(x,baseenv())
      assign(x,tt,envir=asNamespace("base"),inherits=FALSE)
      lockBinding(x,baseenv())
      }
    }
  })

No if you check , the first line of each function of our list :

unique(lapply(na.rm.f,function(x) body(get(x))[[2]]))
[[1]]
na.rm = TRUE



回答2:


It is not possible to change na.rm to TRUE globally. (See Hong Ooi's comment under the question.)

EDIT:

Unfortunately, the answer you don't want is the only one that works generally. There's no global option for this like there is for na.action, which only affects modeling functions like lm, glm, etc (and even there, it isn't guaranteed to work in all cases). – Hong Ooi Jul 2 '13 at 6:23




回答3:


For my R package, I overwrote the existing functions mean and sum. Thanks to the great Ben (comments below), I altered my functions to this:

mean <- function(x, ..., na.rm = TRUE) {
  base::mean(x, ..., na.rm = na.rm)
}

After this, mean(c(2, NA, 3)) = 2.5 instead of NA.

And for sum:

sum <- function(x, ..., na.rm = TRUE) {
  base::sum(x, ..., na.rm = na.rm)
}

This will yield sum(c(2, NA, 3)) = 5 instead of NA.

sum(c(2, NA, 3, NaN)) also works.




回答4:


There were several answers about changing na.rm argument globally already. I just want to notice about partial() function from purrr or pryr packages. Using this function you can create a copy of existing function with predefined arguments:

library(purrr)
.mean <- partial(mean, na.rm = TRUE)

# Create sample vector
df <- c(1, 2, 3, 4, NA, 6, 7)

mean(df)
>[1] NA

.mean(df)
>[1] 3.833333

We can combine this tip with @agstudy answer and create copies of all functions with na.rm = TRUE argument:

library(purrr)

# Create a vector of function names https://stackoverflow.com/a/17423072/9300556
Funs <- Filter(is.function,sapply(ls(baseenv()),get,baseenv()))
na.rm.f <- names(Filter(function(x) any(names(formals(args(x)))%in% 'na.rm'),Funs))

# Create strings. Dot "." is optional
fs <- lapply(na.rm.f,
             function(x) paste0(".", x, "=partial(", x ,", na.rm = T)"))

eval(parse(text = fs)) 

So now, there are .all, .min, .max, etc. in our .GlobalEnv. You can run them:

.min(df)
> [1] 1
.max(df)
> [1] 7
.all(df)
> [1] TRUE

To overwrite functions, just remove dot "." from lapply call. Inspired by this blogpost



来源:https://stackoverflow.com/questions/17418640/is-it-possible-to-set-na-rm-to-true-globally

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!