If data present, replace with data from another column based on row ID

问题

I would like to replace data in columns rep1 to rep4. The data in these columns match unique ID's in the first column. I want to replace the data in columns rep1-rep4 with data in the value column with the corresponding ID row. So, for the second row "b", I want to replace "a" in the column "rep1" with the corresponding value in row "a", in this case, -400.

ID  rep1  rep2  rep3  rep4  value
a                           -400
b   a                       -300
c   a     b                 -200
d   a     b     c           -300
e   a     b     c     d     -400
f                           -400
g   f                       -400
h                           -400
i                           -200
j   k     l                 -300
k   l                       -200
l                           -300
m                           -300

It seems like using ifelse(!is.na()) might be able to do something here, but I'm not sure how to match the ID data in columns rep1 to rep4 to the corresponding row in the ID column, identifying what data in "value" is supposed to be used in the replacement. Can this be done in the same dataframe, or does it need to be split into two different dataframes to work?

Here is the data using dput()

structure(list(ID = structure(1:13, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"), 
    rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L, 
    5L, 1L, 1L), .Label = c("", "a", "f", "k", "l"), class = "factor"), 
    rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
    1L, 1L, 1L), .Label = c("", "b", "l"), class = "factor"), 
    rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
    "d"), class = "factor"), value = c(-400L, -300L, -200L, -300L, 
    -400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L, -300L
    )), class = "data.frame", row.names = c(NA, -13L))

回答1:

Here a variant with tidyverse:

df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])

Explanation:

mutate_at allows to select a range of variables to be modified
the ~ ... . (quosure style lambda notation) allows to use an expression in which . (dot) stands for the column to be modified. Otherwise you would have to use function(x) df$value[match(x, df$ID)], which is a lot to type.
vars() are necessary in mutate_at to be able to select columns without quotes (otherwise you would need to use 2:5 or paste0("rep", 1:4)).

回答2:

A base R way would be to identify names of the column which we want to match (here rep), then unlist them and match with ID and replace them with corresponding value.

cols <- grep("^rep", names(df))
df[cols] <- df$value[match(unlist(df[cols]), df$ID)]

df
#   ID rep1 rep2 rep3 rep4 value
#1   a   NA   NA   NA   NA  -400
#2   b -400   NA   NA   NA  -300
#3   c -400 -300   NA   NA  -200
#4   d -400 -300 -200   NA  -300
#5   e -400 -300 -200 -300  -400
#6   f   NA   NA   NA   NA  -400
#7   g -400   NA   NA   NA  -400
#8   h   NA   NA   NA   NA  -400
#9   i   NA   NA   NA   NA  -200
#10  j -200 -300   NA   NA  -300
#11  k -300   NA   NA   NA  -200
#12  l   NA   NA   NA   NA  -300
#13  m   NA   NA   NA   NA  -300

data

df <- structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m"), rep1 = c(NA, "a", "a", "a", "a", NA, 
"f", NA, NA, "k", "l", NA, NA), rep2 = c(NA, NA, "b", "b", "b", 
NA, NA, NA, NA, "l", NA, NA, NA), rep3 = c(NA, NA, NA, "c", "c", 
NA, NA, NA, NA, NA, NA, NA, NA), rep4 = c(NA, NA, NA, "MA", "d", 
NA, NA, NA, NA, NA, NA, NA, NA), value = c(-400L, -300L, -200L, 
-300L, -400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L, 
-300L)), class = "data.frame", row.names = c(NA, -13L))

来源：https://stackoverflow.com/questions/57000883/if-data-present-replace-with-data-from-another-column-based-on-row-id

标签

dataframe

if-statement

replace