How do I sweep specific columns with dplyr?

后端 未结 2 423
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-15 23:12

An incredibly common operation for my type of data is applying a normalisation factor to all columns. This can be done efficiently using sweep or scale

相关标签:
2条回答
  • 2020-12-15 23:48

    Given akrun's encouragement, let me post what I did as an answer here. I just intuitively thought that you might want to ask R to indicate columns with a same name to do this mutate_each. For instance, if . indicates the column, A, I thought another column named A from another data.frame might be something dplyr might like. So, I created a data frame for factors then used mutate_each. It seems that the outcome is right. Since I have no technical background, I am afraid that I cannot really provide any explanation. I hope you do not mind that.

    factors <- data.frame(A = 1, B = 1.2, C = 0.8, D = 0.75)
    
    mutate_at(data, vars(A:D), funs(. / foo$.))
    
    # By the time I answered this question, the following was working.
    # But mutate_each() is now deprecated.
    
    # mutate_each(data, funs(. / factors$.), A:D)
    
    #  ID Type    A           B      C          D
    #1  1    X    3   0.8333333   3.75   5.333333
    #2  2    X  174 106.6666667  82.50  76.000000
    #3  3    X    6   1.6666667   2.50   5.333333
    #4  4    Y 1377 849.1666667 312.50 334.666667
    #5  5    Y  537 353.3333333 161.25 165.333333
    #6  6    Y  173 115.8333333  50.00  50.666667
    

    EDIT

    This also works. Given data frame is a special case of list, this is not perhaps surprising.

    # Experiment
    foo <- list(A = 1, B = 1.2, C = 0.8, D = 0.75)
    
    mutate_at(data, vars(A:D), funs(. / foo$.))
    
    # mutate_each(data, funs(. / foo$.), A:D)
    
    #  ID Type    A           B      C          D
    #1  1    X    3   0.8333333   3.75   5.333333
    #2  2    X  174 106.6666667  82.50  76.000000
    #3  3    X    6   1.6666667   2.50   5.333333
    #4  4    Y 1377 849.1666667 312.50 334.666667
    #5  5    Y  537 353.3333333 161.25 165.333333
    #6  6    Y  173 115.8333333  50.00  50.666667
    
    0 讨论(0)
  • 2020-12-15 23:55

    From dplyr 1.0.0, you can do:

    data %>%
     rowwise() %>%
     mutate(across(A:D)/factors)
    
         ID Type      A       B      C      D
      <dbl> <chr> <dbl>   <dbl>  <dbl>  <dbl>
    1     1 X         3   0.833   3.75   5.33
    2     2 X       174 107.     82.5   76   
    3     3 X         6   1.67    2.5    5.33
    4     4 Y      1377 849.    312.   335.  
    5     5 Y       537 353.    161.   165.  
    6     6 Y       173 116.     50     50.7 
    
    0 讨论(0)
提交回复
热议问题