In R, can I make the table() function return the number of NA values in a named element?

问题

I am using R to summarize a large amount of data for a report. I want to be able to use lapply() to generate a list of tables from the table() function, from which I can extract my desired statistics. There are a lot of these, so I've written a function to do it. My issue is that I am having difficulty returning the number of missing (NA) values even though I have that in each table, because I can't figure out how to tell R that I want the element from table() that holds the number of NA values. As far as I can tell, R is "naming" that element NA...and I can't call that.

I'm trying to avoid writing some complex statement where I say something like which(is.na(names(element[1]))) | names(element[1])=="var_I_want" because I feel like that's just really wordy. I was hoping there was some way to either tell R to label the NA variable in each table with a character name, or to tell it to pick the one labeled NA, but I haven't had much luck yet.

Minimal example:

example <- data.frame(ID=c(10,20,30,40,50),
                      V1=c("A","B","A",NA,"C"),
                      V2=c("Dog","Cat",NA,"Cat","Bunny"),
                      V3=c("Yes","No","No","Yes","No"),
                      V4=c("No",NA,"No","No","Yes"),
                      V5=c("No","Yes","Yes",NA,"No"))

varlist <- c("V1","V2","V3","V4","V5")

list_o_tables <- lapply(X=example[varlist],FUN=table,useNA="always")

list(V1=list_o_tables[["V1"]]["A"],
     V2=list_o_tables[["V2"]]["Cat"],
     V3=list_o_tables[["V3"]]["Yes"],
     V4=list_o_tables[["V4"]]["Yes"],
     V5=list_o_tables[["V5"]]["Yes"])

What I get:

$V1
A 
2 

$V2
Cat 
  2 

$V3
Yes 
  2 

$V4
Yes 
  1 

$V5
Yes 
  2

What I'd like:

$V1
A     <NA>
2       1

$V2
Cat   <NA>
  2     1

$V3
Yes   <NA> 
  2     0

$V4
Yes   <NA> 
  1     1

$V5
Yes   <NA> 
  2     1

回答1:

This is ugly (IMHO) but it works:

my_table <- function(x){
    setNames(table(x,useNA = "always"),c(sort(unique(x[!is.na(x)])),'NA'))
}

So you'd lapply this instead, and then you'd have access to the NA column.

Looking more closely, this is rooted in the behavior of factor:

levels(factor(c(1,NA,2),exclude = NULL))
[1] "1" "2" NA

My recollection is that the distinction between a factor level of NA versus "NA" has been at the very least a source of confusion in R in the past. I feel like I've seen some debates about the merits of this on r-devel, but I can't recall for sure at the moment.

So the issue is, if you have a factor with NA values, what do you call the levels? Technically, this is correct, one of the levels is "missing" not literally "NA". It would be nice (IMHO) if table didn't adhere to this quite so strictly, though.

回答2:

tab[match(NA, names(tab))] seems to work where tab[NA], tab[NA_character_], tab["NA_character_"], tab["<NA>"], etc. etc. fail...

f <- function(nms, obj) {
    obj[sapply(c(nms, NA), function(X) match(X, names(obj)))]
}

f("Cat", list_o_tables[["V2"]])
#  Cat <NA> 
#    2    1 

mapply(f, list("A", "Cat", "Yes", "Yes", "Yes"), list_o_tables, SIMPLIFY=FALSE)
# [[1]]
# 
#    A <NA> 
#    2    1 
# 
# [[2]]
# 
#  Cat <NA> 
#    2    1 
# 
# [[3]]
# 
#  Yes <NA> 
#    2    0 
# 
# [[4]]
# 
#  Yes <NA> 
#    1    1 
# 
# [[5]]
# 
#  Yes <NA> 
#    2    1

回答3:

Why not just fix the names up after the fact?

tables <- lapply(example[-1], table, useNA = "ifany")

fix_names <- function(x) {
  names(x)[is.na(names(x))] <- "<NA>"
  x
}
lapply(tables, fix_names)

回答4:

When you set useNA="always", table() always adds NA as the last result, therefore one way to do this would be to use tail to your advantage. Assuming we have your list from above (which I'll call l1)...

l1 <- list(V1=list_o_tables[["V1"]]["A"],
     V2=list_o_tables[["V2"]]["Cat"],
     V3=list_o_tables[["V3"]]["Yes"],
     V4=list_o_tables[["V4"]]["Yes"],
     V5=list_o_tables[["V5"]]["Yes"])

We can get the NA and then join them like this..

l2 <- lapply( list_o_tables , tail , 1 )
mapply( c , l1, l2 , SIMPLIFY = FALSE )
#$V1
#   A <NA> 
#   2    1 

#$V2
# Cat <NA> 
#   2    1 

#$V3
# Yes <NA> 
#   2    0 

#$V4
# Yes <NA> 
#   1    1 

#$V5
# Yes <NA> 
#   2    1

来源：https://stackoverflow.com/questions/20434764/in-r-can-i-make-the-table-function-return-the-number-of-na-values-in-a-named

标签