R: conditionally replace values in loop

蹲街弑〆低调 提交于 2019-12-08 11:13:00

问题


I have a dataframe like so:

rel      <- c(2, 5, NA, 3, 6)
year.in  <- c(4, NA, 2, 3, 2)
year.out <- c(6, 7, NA, 5, 4)
year.1   <- c(NA, NA, NA, NA, NA)
year.2   <- c(NA, NA, NA, NA, NA)
year.3   <- c(NA, NA, NA, NA, NA)
year.4   <- c(NA, NA, NA, NA, NA)
year.5   <- c(NA, NA, NA, NA, NA) 

df <- as.data.frame(cbind(rel, year.in, year.out, year.1, year.2, year.3, 
                      year.4, year.5))

What I would like to do is update the missing values in year.1 - year.5 with the value of 'rel', but only if: (year.in >= year.i AND year.out <= year.i) (with i is 1:5)

Focussing on the just the year of entry, I came up with this:

for (i in 1:5) ifelse(df$year.in < i,
    df[paste("year", i, sep= ".")]<- NA,
    df[paste("year", i, sep= ".")]<- df["rel"])

But this merely replaces all year.i variables with the value of rel.

I have two questions:

  • how can I update the year.i variables with the 'rel' values on the conditions mentioned?

  • is it bad to use the if else statement here?

Best and thanks in advance,

Richard


回答1:


library(data.table)
dt = data.table(df)

for(i in 1:5) dt[year.in <= i & i <= year.out, paste0('year.', i) := rel]

dt
#   rel year.in year.out year.1 year.2 year.3 year.4 year.5
#1:   2       4        6     NA     NA     NA      2      2
#2:   5      NA        7     NA     NA     NA     NA     NA
#3:  NA       2       NA     NA     NA     NA     NA     NA
#4:   3       3        5     NA     NA      3      3      3
#5:   6       2        4     NA      6      6      6     NA



回答2:


I'd melt your data using the reshape2 package:

library(reshape2)    
df.melt <- melt(df, id.vars=c('rel', 'year.in', 'year.out'))

Dig out the numeric year:

df.melt$year <- as.integer(gsub('year\\.', '', df.melt$variable))

Then use vectorized operations:

subsetter <- with(df.melt, year.in >= year & year.out <= year.out)
subsetter[is.na(subsetter)] <- FALSE
df.melt$value[subsetter] <- df.melt$rel[subsetter]

However, in your example, everything fails your condition.

Using ifelse is perfectly acceptable, however, don't do assignment inside. Instead assign its result to something, like below. The issue was that the two assignments you were performing inside the ifelse were not on each subset but instead acting as though they were just run independently.

for (i in 1:5) {
  year_col <- paste('year', i, sep='.')
  df[[year_col]] <- ifelse(df$year.in >= i & df$year.out <= i,
                         df$rel,
                         df[[year_col]])
}

To answer your bullets:

  • See above.

  • There is nothing specifically wrong with using ifelse and sometimes it is handy to do so for readability. However, it is a "looping" construct and thus can often be replaced by a more efficient vectorized solution.



来源:https://stackoverflow.com/questions/17663515/r-conditionally-replace-values-in-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!