Summing Multiple Groups of Columns

本秂侑毒 提交于 2019-12-05 02:23:07

We could split the 'class_df' by 'class', loop through the list elements with map, select the columns of 'class_df' and get the rowSums

library(tidyverse)
class_fg %>%
    split(.$fg) %>% 
    map_df(~ class_df %>%
                select(one_of(.x$class)) %>% 
                rowSums) %>%
    bind_cols(class_df, .)
#  id    A   B    C    D    E   F   X   Y   Z
#1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

Or do a group by nesting, and then do the rowSums by mapping over the list elements

class_fg %>% 
   group_by(fg) %>%
   nest %>%
   mutate(out = map(data, ~  class_df %>%
                               select(one_of(.x$class)) %>% 
                               rowSums)) %>% 
   select(-data)  %>%
   unnest %>% 
   unstack(., out ~ fg) %>% 
   bind_cols(class_df, .)

Always it is easier to work on data in long format. Hence, change class_df to long format using tidyr:gather and join with class_fg. Perform analysis in long format on your data. Finally, spread in wide-format to match expected result.

library(tidyverse)

class_df %>% gather(key, value, -id) %>% 
  inner_join(class_fg, by=c("key" = "class")) %>%
  group_by(id, fg) %>%
  summarise(value = sum(value)) %>%
  spread(fg, value) %>%
  inner_join(class_df, by="id") %>% as.data.frame()

#   id   X   Y   Z    A   B    C    D    E   F
# 1  1 0.0 0.4 0.6 0.20 0.3 0.10 0.15 0.25 0.0
# 2  2 0.4 0.4 0.2 0.05 0.1 0.05 0.30 0.10 0.4
# 3  3 0.3 0.4 0.3 0.10 0.1 0.10 0.20 0.20 0.3

Data:

class_fg <- read.table(text = 
"class         fg
                 A          Z
                 B          Z
                 C          Z
                 D          Y
                 E          Y
                 F          X",
header = TRUE, stringsAsFactors = FALSE)

class_df  <- read.table(text = 
"id    A    B    C    D    E    F
1 0.20 0.30 0.10 0.15 0.25 0.00 
2 0.05 0.10 0.05 0.30 0.10 0.40
3 0.10 0.10 0.10 0.20 0.20 0.30",
header = TRUE, stringsAsFactors = FALSE)

Yet another option, along with the already contributed working solutions, would be to use quasiquotation with the rlang package to build expressions to calculate the sums in each group.

library(tidyverse)

First, define a helper function for doing an elementwise sum of vectors:

psum <- function(...) reduce(list(...), `+`)

Extracting the groupings into a list from class_fg we can then construct a list of expressions to calculate the sum in each group:

sum_exprs <- with(class_fg, split(class, fg)) %>% 
  map(~ rlang::expr(psum(!!!rlang::syms(.x))))

sum_exprs
#> $X
#> psum(F)
#> 
#> $Y
#> psum(D, E)
#> 
#> $Z
#> psum(A, B, C)

With the list of expressions ready, we can just "bang-bang-bang" (!!!) them into the data with mutate:

class_df %>%
  mutate(!!!sum_exprs)
#>   id    A   B    C    D    E   F   X   Y   Z
#> 1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

(I used the code provided by @MKR in his answer to create the data).

Created on 2018-05-22 by the reprex package (v0.2.0).

My usual approach is to stick to base R as long as the data sets don't get too large. In your case, a base R solution would be:

class_df=as.data.frame(
  c(class_df,
    lapply(split(class_fg,class_fg$fg),
           function(x) rowSums(class_df[,x$class,drop=FALSE]))))
class_df
#  id    A   B    C    D    E   F   X   Y   Z
#1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

If the data sets get too large, I use data.table. A data.table solution for your problem:

library(data.table)

class_dt=data.table(class_df)
grps=split(class_fg,class_fg$fg)

for (g in grps) class_dt[,c(g$fg[1]):=rowSums(.SD),.SDcols=g$class,]
class_dt
#   id    A   B    C    D    E   F   X   Y   Z
#1:  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2:  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3:  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

Another tidyverse solution using rowSums on column subsets :

library(tidyverse)
class_fg %>%
  group_by(fg) %>% 
  summarize(list(rowSums(class_df[class]))) %>%
  spread(1,2) %>%
  unnest() %>%
  bind_cols(class_df, .)

#>   id    A   B    C    D    E   F   X   Y   Z
#> 1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

Or for the glory of code golf :

x <- with(class_fg, tapply(class, fg, reformulate))
mutate(class_df, !!!map(x, ~as.list(.)[[2]]))
#>   id    A   B    C    D    E   F   X   Y   Z
#> 1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!