aggregate methods treat missing values (NA) differently

前端 未结 2 1486
孤城傲影
孤城傲影 2020-11-27 19:11

Here\'s a simple data frame with a missing value:

M = data.frame( Name = c(\'name\', \'name\'), Col1 = c(NA, 1) , Col2 = c(1, 1))
#   Name Col1 Col2
# 1 name          


        
相关标签:
2条回答
  • 2020-11-27 19:46

    If you want the formula version to be equivalent try this:

    M = data.frame( Name = rep('name',5), Col1 = c(NA,rep(1,4)) , Col2 = rep(1,5))
    aggregate(. ~ Name, M, function(x) sum(x, na.rm=TRUE), na.action = na.pass)
    
    0 讨论(0)
  • 2020-11-27 20:07

    Good question, but in my opinion, this shouldn't have caused a major debugging headache because it is documented quite clearly in multiple places in the manual page for aggregate.

    First, in the usage section:

    ## S3 method for class 'formula'
    aggregate(formula, data, FUN, ...,
              subset, na.action = na.omit)
    

    Later, in the description:

    na.action: a function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables.


    I can't answer why the formula mode was written differently---that's something the function authors would have to answer---but using the above information, you can probably use the following:

    aggregate(.~Name, M, FUN=sum, na.rm=TRUE, na.action=NULL)
    #   Name Col1 Col2
    # 1 name    1    2
    
    0 讨论(0)
提交回复
热议问题