How to substract multiple column based on condition in R

旧城冷巷雨未停 提交于 2021-02-10 14:32:32

问题


In my new dataset, I want to subtract the interest column(s) with interestpaid column(s) and interestunpaid column(s). For example, if there is 3 interest column and 2 interestpaid column and 2 interestunpaid column then the first interest column must be subtracted from the first interestpaid column and interestunpaid column, 2nd with 2nd and 3rd with 3rd. The last remaining 1 interest columns must be as it is.

interest1  interest2  interest3  interestpaid1  interestpaid2  interestunpaid1 interestunpaid2
100         100        150         50             50              10              20
200         150        100        100            100              20              30
300         250        10         100            150              30              40
400         100        200        100            150              40              50

My output should be be calculated as mentioned below. Below, I = Interest, IP= InterestPaid, IUP=InterestUnPaid

output1               output2                  output3
40(I1-IP1-IUP1)     30(I2-IP2-IUP2)            150 (I3)
80                  20                         100
170                 60                         10
260                 -100                       200

回答1:


One dplyr and purrr option could be:

map_dfc(.x = unique(sub("\\D+", "", names(df))),
        ~ df %>%
         transmute(!!paste0("output", .x) := reduce(across(ends_with(.x)), `-`)))

  output1 output2 output3
1      40      30     150
2      80      20     100
3     170      60      10
4     260    -100     200



回答2:


k=3

I=paste0("interest",1:k)
IP=paste0("interestpaid",1:k)
IUP=paste0("interestunpaid",1:k)

df$nothing=0

df[ifelse(I %in% colnames(df),I,"nothing")]-
df[ifelse(IP %in% colnames(df),IP,"nothing")]-
df[ifelse(IUP %in% colnames(df),IUP,"nothing")]

and the result

  interest1 interest2 interest3
1        40        30       150
2        80        20       100
3       170        60        10
4       260      -100       200



回答3:


This does what you ask, but I would think about making the data long so you don't have to specify each output, but just use group_by.

library(tidyr)
library(dplyr)

Lines <- "interest1  interest2  interest3  interestpaid1  interestpaid2  interestunpaid1 interestunpaid2
100         100        150         50             50              10              20
200         150        100        100            100              20              30
300         250        10         100            150              30              40
400         100        200        100            150              40              50
"
data <-read.table(text = Lines, header = TRUE)

output <- mutate(data, output1 = interest1 - interestpaid1 - interestunpaid1,
                       output2 = interest2 - interestpaid2 - interestunpaid2,
                       output3 = interest3) %>%
  select(starts_with("output"))

Not the shortest way, but will also work if the number of interest variables change

output <- mutate(data, obs = row_number()) %>%
  pivot_longer(cols = 1:(ncol(data)), names_to = c("variable", "num"), names_sep = -1) %>%
  pivot_wider(names_from = variable, values_from = value) %>%
  mutate(across(starts_with("interest"), ~replace_na(.x, 0))) %>%
  mutate(output = interest - interestpaid - interestunpaid) %>%
  mutate(num = paste("output",num)) %>%
  select(obs, num, output) %>%
  pivot_wider(id_cols = obs, names_from = num, values_from = output)
  


来源:https://stackoverflow.com/questions/66026627/how-to-substract-multiple-column-based-on-condition-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!