I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.
1) dplyr/tidyr Convert to long form, summarize and convert back to wide form:
library(dplyr)
library(tidyr)
DF %>%
mutate(Row = 1:n()) %>%
gather(colname, value, -Row) %>%
group_by(g = gsub("\\d", "", colname), Row) %>%
summarize(sum = sum(value)) %>%
ungroup %>%
mutate(g = paste("sum", g, sep = "_")) %>%
spread(g, sum) %>%
arrange(Row) %>%
cbind(DF, .) %>%
select(-Row)
giving:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 1 4 10 9 3 15 10 7 25
2 2 5 11 10 4 16 12 9 27
3 4 7 13 12 6 18 16 13 31
4 5 8 14 13 7 19 18 15 33
2) base using matrix multiplication
nms
is a vector of column names without the digits and prefaced with sum_
. u
is a vector of the unique elements of it. Form a logical matrix using outer
from that which when multiplied by DF
gives the sums -- the logicals get converted to 0-1 when that is done. Finally bind it to the input.
nms <- gsub("(\\D+)\\d", "sum_\\1", names(DF))
u <- unique(nms)
sums <- as.matrix(DF) %*% outer(nms, setNames(u, u), "==")
cbind(DF, sums)
giving:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 1 4 10 9 3 15 10 7 25
2 2 5 11 10 4 16 12 9 27
3 4 7 13 12 6 18 16 13 31
4 5 8 14 13 7 19 18 15 33
3) base with tapply
Using nms
from (2) apply tapply to each row:
cbind(DF, t(apply(DF, 1, tapply, nms, sum)))
giving:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 1 4 10 9 3 15 10 7 25
2 2 5 11 10 4 16 12 9 27
3 4 7 13 12 6 18 16 13 31
4 5 8 14 13 7 19 18 15 33
You may wish to replace nms with factor(nms, levels = unique(nms))
in the above expression if the names are not in ascending order.