Summarise over all columns

前端 未结 3 1610
星月不相逢
星月不相逢 2020-12-17 19:31

I have data of the following format:

gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
相关标签:
3条回答
  • 2020-12-17 19:51

    I once did something similar, and by that time I ended up with:

    x %>%
      rowwise() %>%
      do(data.frame(., res = sum(unlist(.))))
    #    A  C G  T res
    # 1  3  2 8  6  19
    # 2  6  1 7 10  24
    # 3  4  8 6  7  25
    # 4  6  4 7  8  25
    # 5  6 10 7  2  25
    # 6  7  1 2  2  12
    # 7  5  4 8  5  22
    # 8  9  2 3  2  16
    # 9  3  4 7  6  20
    # 10 7  5 3  9  24
    

    Perhaps your more complex function works fine without unlist, but it seems like it is necessary for sum. Because . refers to the "current group", I initially thought that . for e.g. the first row in the rowwise machinery would correspond to x[1, ], which is a list, which sum swallows happily outside do

    is.list((x[1, ]))
    # [1] TRUE
    
    sum(x[1, ])
    # [1] 19 
    

    However, without unlist in do an error is generated, and I am not sure why:

    x %>%
      rowwise() %>%
      do(data.frame(., res = sum(.)))
    # Error in sum(.) : invalid 'type' (list) of argument
    
    0 讨论(0)
  • 2020-12-17 20:01

    Does this do what you'd like?

    Data %>%
       mutate(SumVar=rowSums(.))
    
    0 讨论(0)
  • 2020-12-17 20:04

    I'll try to show an example of what I wrote in my comment. Let's assume you have a custom-function f:

    f <- function(vec) sum(vec)^2
    

    And you want to apply this function to each row of your data.frame x. One option in base R would be to use apply, as you show in your question:

    > transform(x, z = apply(x, 1, f))
    #   A  C  G T   z
    #1  5  7 10 7 841
    #2  1  9  5 9 576
    #3  7 10  2 4 529
    #4  1  4 10 1 256
    #5  4  4  5 2 225
    #6  9  1  6 8 576
    #7  9  3  7 1 400
    #8  5  2  7 5 361
    #9  6  3 10 4 529
    #10 5 10  1 6 484
    

    Little disadvantage here is, because you are using apply on a data.frame, the whole data.frame is converted to matrix first and this would mean of course that all columns are converted to the same type.

    With dplyr (and tidyr) you could solve the problem with gathering/melting and spreading/casting afterwards.

    library(dplyr)
    library(tidyr)
    x %>% 
      mutate(n = row_number()) %>%    # add row numbers for grouping 
      gather(key, value, A:T) %>%
      group_by(n) %>% 
      mutate(z = f(value)) %>%
      ungroup() %>%
      spread(key, value) %>%
      select(-n)
    
    #Source: local data frame [10 x 5]
    #
    #     z A  C  G T
    #1  841 5  7 10 7
    #2  576 1  9  5 9
    #3  529 7 10  2 4
    #4  256 1  4 10 1
    #5  225 4  4  5 2
    #6  576 9  1  6 8
    #7  400 9  3  7 1
    #8  361 5  2  7 5
    #9  529 6  3 10 4
    #10 484 5 10  1 6
    

    This is obviously quite a bit longer code than using apply but as soon as the data get a bit larger, I expect this to be a lot faster than any apply over the rows of a data.frame.

    Alternatively, you could use rowwise if you specify the columns manually:

    x %>%
      rowwise %>%
      mutate(z = f(c(A,C,G,T)))  # manual column specification
    
    #Source: local data frame [10 x 5]
    #Groups: <by row>
    # 
    #  A  C  G T   z
    #1  5  7 10 7 841
    #2  1  9  5 9 576
    #3  7 10  2 4 529
    #4  1  4 10 1 256
    #5  4  4  5 2 225
    #6  9  1  6 8 576
    #7  9  3  7 1 400
    #8  5  2  7 5 361
    #9  6  3 10 4 529
    #10 5 10  1 6 484
    

    I haven't figured out yet, if the rowwise solution can be changed so that it would work with character input of the column names - perhaps with lazyeval somehow.

    data:

    set.seed(16457)
    gen = function () sample.int(10, replace = TRUE)
    x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
    
    0 讨论(0)
提交回复
热议问题