What is the difference between and NA?

前端 未结 3 1622
心在旅途
心在旅途 2020-12-13 17:35

I have a factor named SMOKE with levels \"Y\" and \"N\". Missing values were replaced with NA (from the initial level \"NULL\"). However when I view the factor I get somethi

相关标签:
3条回答
  • 2020-12-13 17:54

    Perhaps one exception might be data.table. There it seems that a character field prints it as < NA >, while a numeric one as NA. NB: I added extra spaces in < NA >, otherwise this webpage did not show it properly.

    library("data.table")
    
    y<-data.table(a=c("a","b",NA))
    
    print(y)
          a
    1:    a
    2:    b
    3: < NA >
    
    factor(y$a)
    
    [1] a    b    < NA >
    
    Levels: a b
    
    ## we enter a numeric argument
    
    y<-data.table(a=c(1,2,NA))
    
    print(y)
        a
    1:  1
    2:  2
    3: NA
    
    factor(y$a)
    
    [1] 1    2    < NA >
    
    Levels: 1 2
    
    0 讨论(0)
  • 2020-12-13 18:00

    When you are dealing with factors, when the NA is wrapped in angled brackets ( <NA> ), that indicates thtat it is in fact NA.

    When it is NA without brackets, then it is not NA, but rather a proper factor whose label is "NA"

    # Note a 'real' NA and a string with the word "NA"
    x <- factor(c("hello", NA, "world", "NA"))
    
    x
    [1] hello <NA>  world NA   
    Levels: hello NA world      <~~ The string appears as a level, the actual NA does not. 
    
    as.numeric(x)              
    [1]  1 NA  3  2            <~~ The string has a numeric value (here, 2, alphabetically)
                                   The NA's numeric value is just NA
    

    Edit to answer @Arun's question:

    R is simply trying to distinguish between a string whose value are the two letters "NA" and an actual missing value, NA Thus the difference you see when displaying df versus df$y. Example:

    df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)
    

    Note the two different styles of NA:

    > df
      x    y
    1 1    a
    2 2 <NA>
    3 3    c
    4 4   NA
    

    However, if we look at just 'df$y'

    [1] "a"  NA   "c"  "NA"
    

    But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):

    print(df$y, quote=FALSE)
    [1] a    <NA> c    NA  
    

    And thus, we once again have the distinction of NA via the angled brackets.

    0 讨论(0)
  • 2020-12-13 18:00

    It is just the way that R displays NA in a factor:

    > as.factor(NA)
    [1] <NA>
    Levels: 
    > 
    > f <- factor(c(1:3, NA))
    > levels(f)
    [1] "1" "2" "3"
    > f
    [1] 1    2    3    <NA>
    Levels: 1 2 3
    > is.na(f)
    [1] FALSE FALSE FALSE  TRUE
    

    One presumes this is a means by which one would differentiate between NA and "NA" in the way a factor is printed as it prints without the quotes, even for character labels/levels:

    > f2 <- factor(c("NA",NA))
    > f2
    [1] NA   <NA>
    Levels: NA
    > is.na(f2)
    [1] FALSE  TRUE
    
    0 讨论(0)
提交回复
热议问题