ignore NA in dplyr row sum

后端 未结 6 2125
自闭症患者
自闭症患者 2020-11-27 17:27

is there an elegant way to handle NA as 0 (na.rm = TRUE) in dplyr?

data <- data.frame(a=c(1,2,3,4), b=c(4,NA,5,6), c=c(7,8,9,NA))

data %>% mutate(sum          


        
6条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-27 17:43

    Another option:

    data %>%
      mutate(sum = rowSums(., na.rm = TRUE))
    

    Benchmark

    library(microbenchmark)
    mbm <- microbenchmark(
    steven = data %>% mutate(sum = rowSums(., na.rm = TRUE)), 
    lyz    = data %>% rowwise() %>% mutate(sum = sum(a, b, c, na.rm=TRUE)),
    nar    = apply(data, 1, sum, na.rm = TRUE),
    akrun  = data %>% mutate_each(funs(replace(., which(is.na(.)), 0))) %>% mutate(sum=a+b+c),
    frank  = data %>% mutate(sum = Reduce(function(x,y) x + replace(y, is.na(y), 0), ., 
                                         init=rep(0, n()))),
    times = 10)
    

    #Unit: milliseconds
    #   expr         min          lq       mean     median         uq        max neval cld
    # steven    9.493812    9.558736   18.31476   10.10280   22.55230   65.15325    10 a  
    #    lyz 6791.690570 6836.243782 6978.29684 6915.16098 7138.67733 7321.61117    10   c
    #    nar  702.537055  723.256808  799.79996  805.71028  849.43815  909.36413    10  b 
    #  akrun   11.372550   11.388473   28.49560   11.44698   20.21214  155.23165    10 a  
    #  frank   20.206747   20.695986   32.69899   21.12998   25.11939  118.14779    10 a 
    

提交回复
热议问题