dplyr - using mutate() like rowmeans()

前端 未结 6 1682
后悔当初
后悔当初 2020-12-01 07:55

I can\'t find the answer anywhere.

I would like to calculate new variable of data frame which is based on mean of rows.

For example:

data &l         


        
6条回答
  •  孤城傲影
    2020-12-01 08:21

    I think this is the dplyr-ish way. First, I'd create a function:

    my_rowmeans = function(...) Reduce(`+`, list(...))/length(list(...))
    

    Then, it can be used inside mutate:

    data %>% mutate(rms = my_rowmeans(a, b))
    
    #    id a b c rms
    # 1 101 1 2 3 1.5
    # 2 102 2 2 3 2.0
    # 3 103 3 2 3 2.5
    
    # or
    
    data %>% mutate(rms = my_rowmeans(a, b, c))
    
    #    id a b c      rms
    # 1 101 1 2 3 2.000000
    # 2 102 2 2 3 2.333333
    # 3 103 3 2 3 2.666667
    

    To deal with the possibility of NAs, the function must be uglified:

    my_rowmeans = function(..., na.rm=TRUE){
      x = 
        if (na.rm) lapply(list(...), function(x) replace(x, is.na(x), as(0, class(x)))) 
        else       list(...)
    
      d = Reduce(function(x,y) x+!is.na(y), list(...), init=0)
    
      Reduce(`+`, x)/d
    } 
    
    # alternately...
    
    my_rowmeans2 = function(..., na.rm=TRUE) rowMeans(cbind(...), na.rm=na.rm)
    
    # new example
    
    data$b[2] <- NA  
    data %>% mutate(rms = my_rowmeans(a,b,na.rm=FALSE))
    
       id a  b c rms
    1 101 1  2 3 1.5
    2 102 2 NA 3  NA
    3 103 3  2 3 2.5
    
    data %>% mutate(rms = my_rowmeans(a,b))
    
       id a  b c rms
    1 101 1  2 3 1.5
    2 102 2 NA 3 2.0
    3 103 3  2 3 2.5
    

    The downside to the my_rowmeans2 is that it coerces to a matrix. I'm not certain that this will always be slower than the Reduce approach, though.

提交回复
热议问题