An NA in subsetting a data.frame does something unexpected

前端 未结 1 1366
梦谈多话
梦谈多话 2020-12-01 11:28

Consider the following code. When you don\'t explicitly test for NA in your condition, that code will fail at some later date then your data changes.



        
相关标签:
1条回答
  • 2020-12-01 11:32

    I definitely agree that this isn't intuitive (I made that point before on SO). In defense of R, I think that knowing when you have a missing value is useful (i.e. this is not a bug). The == operator is explicitly designed to notify the user of NA or NaN values. See ?"==" for more information. It states:

    Missing values ('NA') and 'NaN' values are regarded as non-comparable even to themselves, so comparisons involving them will always result in 'NA'.

    In other words, a missing value isn't comparable using a binary operator (because it's unknown).

    Beyond is.na(), you could also do:

    which(a$col2==2) # tests explicitly for TRUE
    

    Or

    a$col2 %in% 2 # only checks for 2
    

    %in% is defined as using the match() function:

    '"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
    

    This is also covered in "The R Inferno".

    Checking for NA values in your data is crucial in R, because many important operators don't handle it the way you expect. Beyond ==, this is also true for things like &, |, <, sum(), and so on. I am always thinking "what would happen if there was an NA here" when I'm writing R code. Requiring an R user to be careful with missing values is "by design".

    Update: How is NA handled when there are multiple logical conditions?

    NA is a logical constant and you might get unexpected subsetting if you don't think about what might be returned (e.g. NA | TRUE == TRUE). These truth tables from ?Logic may provide a useful illustration:

    outer(x, x, "&") ## AND table
    #       <NA> FALSE  TRUE
    #<NA>     NA FALSE    NA
    #FALSE FALSE FALSE FALSE
    #TRUE     NA FALSE  TRUE
    
    outer(x, x, "|") ## OR  table
    #      <NA> FALSE TRUE
    #<NA>    NA    NA TRUE
    #FALSE   NA FALSE TRUE
    #TRUE  TRUE  TRUE TRUE
    
    0 讨论(0)
提交回复
热议问题