dplyr - using mutate() like rowmeans()

前端 未结 6 1676
后悔当初
后悔当初 2020-12-01 07:55

I can\'t find the answer anywhere.

I would like to calculate new variable of data frame which is based on mean of rows.

For example:

data &l         


        
相关标签:
6条回答
  • 2020-12-01 08:07

    dplyr is badly suited to operate on this kind of data because it assumes tidy data format and — for the problem in question — your data is untidy.

    You can of course tidy it first:

    tidy_data = tidyr::gather(data, name, value, -id)
    

    Which looks like this:

       id name value
    1 101    a     1
    2 102    a     2
    3 103    a     3
    4 101    b     2
    5 102    b     2
    6 103    b     2
        …
    

    And then:

    tidy_data %>% group_by(id) %>% summarize(mean = mean(value))
    
        name  mean
      (fctr) (dbl)
    1      a     2
    2      b     2
    3      c     3
    

    Of course this discards the original data. You could use mutate instead of summarize to avoid this. Finally, you can then un-tidy your data again:

    tidy_data %>%
        group_by(id) %>%
        mutate(mean = mean(value)) %>%
        tidyr::spread(name, value)
    
         id     mean     a     b     c
      (dbl)    (dbl) (dbl) (dbl) (dbl)
    1   101 2.000000     1     2     3
    2   102 2.333333     2     2     3
    3   103 2.666667     3     2     3
    

    Alternatively, you could summarise and then merge the result with the original table:

    tidy_data %>%
        group_by(id) %>%
        summarize(mean = mean(value)) %>%
        inner_join(data, by = 'id')
    

    The result is the same in either case. I conceptually prefer the second variant.

    0 讨论(0)
  • 2020-12-01 08:08

    And yet another couple of ways, useful if you have the numeric positions or vector names of the columns to be summarised:

    data %>% mutate(d = rowMeans(.[, 2:4]))
    

    or

    data %>% mutate(d = rowMeans(.[, c("a","b","c")]))
    
    0 讨论(0)
  • 2020-12-01 08:21

    I think the answer suggesting using data.frame or slicing on . is the best, but could be made simpler and more dplyr-ish like so:

    data %>% mutate(c = rowMeans(select(., a,b)))
    

    Or if you want to avoid ., with the penalty of having two inputs to your pipeline:

    data %>% mutate(c = rowMeans(select(data, a,b)))
    
    0 讨论(0)
  • 2020-12-01 08:21

    I think this is the dplyr-ish way. First, I'd create a function:

    my_rowmeans = function(...) Reduce(`+`, list(...))/length(list(...))
    

    Then, it can be used inside mutate:

    data %>% mutate(rms = my_rowmeans(a, b))
    
    #    id a b c rms
    # 1 101 1 2 3 1.5
    # 2 102 2 2 3 2.0
    # 3 103 3 2 3 2.5
    
    # or
    
    data %>% mutate(rms = my_rowmeans(a, b, c))
    
    #    id a b c      rms
    # 1 101 1 2 3 2.000000
    # 2 102 2 2 3 2.333333
    # 3 103 3 2 3 2.666667
    

    To deal with the possibility of NAs, the function must be uglified:

    my_rowmeans = function(..., na.rm=TRUE){
      x = 
        if (na.rm) lapply(list(...), function(x) replace(x, is.na(x), as(0, class(x)))) 
        else       list(...)
    
      d = Reduce(function(x,y) x+!is.na(y), list(...), init=0)
    
      Reduce(`+`, x)/d
    } 
    
    # alternately...
    
    my_rowmeans2 = function(..., na.rm=TRUE) rowMeans(cbind(...), na.rm=na.rm)
    
    # new example
    
    data$b[2] <- NA  
    data %>% mutate(rms = my_rowmeans(a,b,na.rm=FALSE))
    
       id a  b c rms
    1 101 1  2 3 1.5
    2 102 2 NA 3  NA
    3 103 3  2 3 2.5
    
    data %>% mutate(rms = my_rowmeans(a,b))
    
       id a  b c rms
    1 101 1  2 3 1.5
    2 102 2 NA 3 2.0
    3 103 3  2 3 2.5
    

    The downside to the my_rowmeans2 is that it coerces to a matrix. I'm not certain that this will always be slower than the Reduce approach, though.

    0 讨论(0)
  • 2020-12-01 08:26

    You're looking for

    data %>% 
        rowwise() %>% 
        mutate(c=mean(c(a,b)))
    
    #      id     a     b     c
    #   (dbl) (dbl) (dbl) (dbl)
    # 1   101     1     2   1.5
    # 2   102     2     2   2.0
    # 3   103     3     2   2.5
    

    or

    library(purrr)
    data %>% 
        rowwise() %>% 
        mutate(c=lift_vd(mean)(a,b))
    
    0 讨论(0)
  • 2020-12-01 08:26

    Another simple possibility with few code is:

    data %>%
        mutate(c= rowMeans(data.frame(a,b)))
    
     #     id a b   c
     #  1 101 1 2 1.5
     #  2 102 2 2 2.0
     #  3 103 3 2 2.5
    

    As rowMeans needs something like a matrix or a data.frame, you can use data.frame(var1, var2, ...) instead of c(var1, var2, ...). If you have NAs in your data you'll need to tell R what to do, for example to remove them: rowMeans(data.frame(a,b), na.rm=TRUE)

    0 讨论(0)
提交回复
热议问题