I am trying to find a better way to run a mutate()
on a combination of columns based on parts of the column names.
For example, a way to simplify the mutate
function in the following code:
df <- data.frame(LIMITED_A = c(100,200),
UNLIMITED_A = c(25000,50000),
LIMITED_B = c(300,300),
UNLIMITED_B = c(500,500),
LIMITED_C = c(2,10),
UNLIMITED_C = c(5,20))
df %>%
mutate(FINAL_LIMITED = (LIMITED_A - LIMITED_B) / LIMITED_C,
FINAL_UNLIMITED = (UNLIMITED_A - UNLIMITED_B) / UNLIMITED_C)
A formula with the form: (._A - ._B) / ._C
and the result is given the name FINAL_.
Is there a way to simplify this to a single line of code in the mutate
function?
Here is a different approach:
library(dplyr)
library(rlang)
library(glue)
dynamic_mutate = function(DF,
col_names = gsub("(.*)_\\w+$", "\\1", names(DF)),
expression = "({x}_A - {x}_B)/{x}_C",
prefix = "FINAL"){
name_list = col_names %>%
unique() %>%
as.list()
expr_list = name_list %>%
lapply(function(x) parse_quosure(glue(expression))) %>%
setNames(paste(prefix, name_list, sep = "_"))
DF %>% mutate(!!!expr_list)
}
Result:
> df %>%
+ dynamic_mutate()
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
FINAL_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("LIMITED", "UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
NEW_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_UNLIMITED
1 100 25000 300 500 2 5 4900
2 200 50000 300 500 10 20 2475
> df %>%
+ dynamic_mutate(c("A", "B", "C"), "LIMITED_{x} + UNLIMITED_{x}")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_A FINAL_B FINAL_C
1 100 25000 300 500 2 5 25100 800 7
2 200 50000 300 500 10 20 50200 800 30
Notes:
This approach uses lapply
and glue
to construct expressions from prefixes extracted using gsub
(or you can supply your own prefixes/suffixes). parse_quosure
from rlang
is then used to parse the expression into a quosure
. As a result, expr_list
is a named list of quosure
's which I can then use !!!
to unquote and splice the arguments into separate expressions in mutate
.
You can change the formula by adjusting the expression
argument as shown in the last example.
The advantage of this method is that it is quite fast because I am mainly manipulating column names and creating strings (expressions). The disadvantage is that it uses multiple packages.
One idea is to convert the data frame to long format and conduct the calculation.
library(dplyr)
library(tidyr)
df2 <- df %>%
mutate(ID = 1:n()) %>%
gather(Type, Value, -ID) %>%
separate(Type, into = c("Type", "Group")) %>%
spread(Group, Value) %>%
mutate(Final = (A - B)/C)
df2
ID Type A B C Final
1 1 LIMITED 100 300 2 -100
2 1 UNLIMITED 25000 500 5 4900
3 2 LIMITED 200 300 10 -10
4 2 UNLIMITED 50000 500 20 2475
And you can always convert the data frame back to the wide format.
df3 <- df2 %>%
gather(Group, Value, A:Final) %>%
unite(Col, Type, Group) %>%
spread(Col, Value) %>%
select(colnames(df),
FINAL_LIMITED = LIMITED_Final,
FINAL_UNLIMITED = UNLIMITED_Final)
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED FINAL_UNLIMITED
1 100 25000 300 500 2 5 -100 4900
2 200 50000 300 500 10 20 -10 2475
来源:https://stackoverflow.com/questions/46832436/dplyr-mutate-formula-based-on-similarities-in-column-names