How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

前端 未结 8 682
無奈伤痛
無奈伤痛 2020-12-04 18:34

I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.

8条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-04 19:33

    1) dplyr/tidyr Convert to long form, summarize and convert back to wide form:

    library(dplyr)
    library(tidyr)
    
    DF %>%
      mutate(Row = 1:n()) %>%
      gather(colname, value, -Row) %>%
      group_by(g = gsub("\\d", "", colname), Row) %>%
      summarize(sum = sum(value)) %>%
      ungroup %>%
      mutate(g = paste("sum", g, sep = "_")) %>%
      spread(g, sum) %>%
      arrange(Row) %>%
      cbind(DF, .) %>%
      select(-Row)
    

    giving:

      a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
    1  1  4 10  9  3 15    10     7    25
    2  2  5 11 10  4 16    12     9    27
    3  4  7 13 12  6 18    16    13    31
    4  5  8 14 13  7 19    18    15    33
    

    2) base using matrix multiplication

    nms is a vector of column names without the digits and prefaced with sum_. u is a vector of the unique elements of it. Form a logical matrix using outer from that which when multiplied by DF gives the sums -- the logicals get converted to 0-1 when that is done. Finally bind it to the input.

    nms <- gsub("(\\D+)\\d", "sum_\\1", names(DF))
    u <- unique(nms)
    sums <- as.matrix(DF) %*% outer(nms, setNames(u, u), "==")
    cbind(DF, sums)
    

    giving:

      a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
    1  1  4 10  9  3 15    10     7    25
    2  2  5 11 10  4 16    12     9    27
    3  4  7 13 12  6 18    16    13    31
    4  5  8 14 13  7 19    18    15    33
    

    3) base with tapply

    Using nms from (2) apply tapply to each row:

    cbind(DF, t(apply(DF, 1, tapply, nms, sum)))
    

    giving:

      a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
    1  1  4 10  9  3 15    10     7    25
    2  2  5 11 10  4 16    12     9    27
    3  4  7 13 12  6 18    16    13    31
    4  5  8 14 13  7 19    18    15    33
    

    You may wish to replace nms with factor(nms, levels = unique(nms)) in the above expression if the names are not in ascending order.

提交回复
热议问题