How to mutate_at multiple columns on a condition on each value?

送分小仙女□ 提交于 2021-02-19 05:55:28

问题


I have a dataframe of over 1 million rows, and a column for each hour in the day. I want to mutate each value in those columns, but that modifition depends of the sign of the value. How can I efficiently do that ?

I could do a gather on those hourly values (then spread), but gather seems to be pretty slow on big dataframes. I could also just do the same mutate on all 24 columns, but it does not seems like a great solution when mutate_at looks to be able to do exactly that.

I'll probably have to do that kind of mutate again in the near future, and I hope to find something better than a repetitive, boring to read, code.

df = data.table(
    "ID" = c(1,1,1,2,2), #Should not be useful there
    "Date" = c(1,2,3,1,2), #Should not be useful there
    "total_neg" = c(1,1,0,0,2),
    "total_pos" = c(4,5,2,4,5),
    "H1" = c(5,4,0,5,-5),
    "H2" = c(5,-10,5,5,-5),
    "H3" = c(-10,6,5,0,10)
)

I want to apply something like

df%>%
  mutate_at(c("H1", "H2", "H3"), FUN(ifelse( Hour < 0, Hour*total_neg/10, Hour*total_pos/10)))

With Hour being the value in each column. And it obviously doesn't work, as written, nor does "." but I'm searching for something that would mean "any value in the columns we select in our mutate_at"

If it helps, I'm currently denormalizing some values with the sum of each actual positives values and negatives values stored in two columns.

In my example, this would be the expected result :

df = data.table(
    "ID" = c(1,1,1,2,2),
    "Date" = c(1,2,3,1,2),
    "total_neg" = c(1,1,0,0,2),
    "total_pos" = c(4,5,2,4,5),
    "H1" = c(2,2,0,2,-1),
    "H2" = c(2,-1,1,2,-1),
    "H3" = c(-1,3,1,0,5)
)
df

Thanks in advance for any help you may provide, and I must apologize for my mistakes, but as a non-native, I assure you that I do my best !


回答1:


The FUN is not an argument in mutate_at. In the new version, the earlier used fun is deprecated with list(~ or simply ~. Also, wrap the columns to select in vars. It can also be unquoted or use vars(starts_with("H")) or vars(matches("^H\\d+$")). Also, replace the 'Hour' with .

library(dplyr)
df %>%
    mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0, 
           .*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1  1    1         1         4  2  2 -1
#2  1    2         1         5  2 -1  3
#3  1    3         0         2  0  1  1
#4  2    1         0         4  2  2  0
#5  2    2         2         5 -1 -1  5


来源:https://stackoverflow.com/questions/57329163/how-to-mutate-at-multiple-columns-on-a-condition-on-each-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!