How to create a new variable with values from different variables if another variable equals a set value in R?

天涯浪子 提交于 2019-12-25 18:58:22

问题


I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables:

df$Id <- c(1:12)
df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a)
df$va <- c(1.1, 1.4, 2.5, ...)     #12 randoms values
df$vb <- c(5.9, 2.3, 4.7, ...)     #12 other random values
df$vc <- c(3.0, 3.3, 3.7, ...)     #12 more random values

Then I want to create a new variable that takes the value from va, vb, or vc if the date is equal to a, b, or c. I had tried a nested if-else, which did not work. I also tried:

df$new[df$date=='a' & !is.na(df$date)] <- df$va
df$new[df$date=='b' & !is.na(df$date)] <- df$vb
df$new[df$date=='c' & !is.na(df$date)] <- df$vc

This correctly left NA's in the new variable where Date=NA, however the values provided were not from va, vb, or vc, but some other value altogether. How can I get df$new to equal va if the date is 'a', vb if the date is 'b', and vc if the date is 'c'?


回答1:


Try

library(dplyr)
df %>% 
    mutate(new = (Date=="a")*va + (Date=="b")*vb + (Date=="c")*vc)
#   Id Date         va        vb         vc       new
#1   1 <NA> 0.26550866 0.6870228 0.26722067        NA
#2   2 <NA> 0.37212390 0.3841037 0.38611409        NA
#3   3    a 0.57285336 0.7698414 0.01339033 0.5728534
#4   4    a 0.90820779 0.4976992 0.38238796 0.9082078
#5   5    b 0.20168193 0.7176185 0.86969085 0.7176185
#6   6 <NA> 0.89838968 0.9919061 0.34034900        NA
#7   7 <NA> 0.94467527 0.3800352 0.48208012        NA
#8   8    b 0.66079779 0.7774452 0.59956583 0.7774452
#9   9    c 0.62911404 0.9347052 0.49354131 0.4935413
#10 10    c 0.06178627 0.2121425 0.18621760 0.1862176
#11 11    b 0.20597457 0.6516738 0.82737332 0.6516738
#12 12    a 0.17655675 0.1255551 0.66846674 0.1765568

Or,

library(data.table)
setDT(df)[,new:= (Date=="a")*va + (Date=="b")*vb + (Date=="c")*vc,]

Data

set.seed(1)
df <- data.frame(Id = 1:12,
                 Date = c(NA,NA,"a","a","b",NA,NA,"b","c","c","b","a"),
                 va = runif(12),
                 vb = runif(12),
                 vc = runif(12), stringsAsFactors = FALSE)



回答2:


You may also do row/column indexing using base R (using the data from @ExperimenteR). Even if there are 100's of unique 'Date' with corresponding 'v' columns, we may not need to change the code especially the cbind(..) part.

  df$new <- df[-(1:2)][cbind(1:nrow(df),match(df$Date, sort(unique(df$Date))))]
  df
  #   Id Date         va        vb         vc       new
  #1   1 <NA> 0.26550866 0.6870228 0.26722067        NA
  #2   2 <NA> 0.37212390 0.3841037 0.38611409        NA
  #3   3    a 0.57285336 0.7698414 0.01339033 0.5728534
  #4   4    a 0.90820779 0.4976992 0.38238796 0.9082078
  #5   5    b 0.20168193 0.7176185 0.86969085 0.7176185
  #6   6 <NA> 0.89838968 0.9919061 0.34034900        NA
  #7   7 <NA> 0.94467527 0.3800352 0.48208012        NA
  #8   8    b 0.66079779 0.7774452 0.59956583 0.7774452
  #9   9    c 0.62911404 0.9347052 0.49354131 0.4935413
  #10 10    c 0.06178627 0.2121425 0.18621760 0.1862176
  #11 11    b 0.20597457 0.6516738 0.82737332 0.6516738
  #12 12    a 0.17655675 0.1255551 0.66846674 0.1765568



回答3:


I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be:

df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a' & !is.na(df$date)]
df$new[df$date=='b' & !is.na(df$date)] <- df$vb[df$date=='b' & !is.na(df$date)]
df$new[df$date=='c' & !is.na(df$date)] <- df$vc[df$date=='c' & !is.na(df$date)]

Alternatively, another user noted there is a way to use ifelse, which can be viewed as the correct answer here: https://stats.stackexchange.com/questions/151345/how-to-create-a-new-variable-with-values-from-different-variables-if-another-var

As I added to his answer at that link, I found what worked better was to replace the == with %in%, so that it created a numeric variable instead of a list with a row for each of the 36121 observations in my dataset (12 in the example I provided). That would look like:

df$new[df$date %in% 'a' & !is.na(df$date)] <- df$va[df$date %in% 'a' & !is.na(df$date)]
df$new[df$date %in% 'b' & !is.na(df$date)] <- df$vb[df$date %in% 'b' & !is.na(df$date)]
df$new[df$date %in% 'c' & !is.na(df$date)] <- df$vc[df$date %in% 'c' & !is.na(df$date)]


来源:https://stackoverflow.com/questions/30113308/how-to-create-a-new-variable-with-values-from-different-variables-if-another-var

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!