问题
I would like to replace data in columns rep1 to rep4. The data in these columns match unique ID's in the first column. I want to replace the data in columns rep1-rep4 with data in the value column with the corresponding ID row. So, for the second row "b", I want to replace "a" in the column "rep1" with the corresponding value in row "a", in this case, -400.
ID rep1 rep2 rep3 rep4 value
a -400
b a -300
c a b -200
d a b c -300
e a b c d -400
f -400
g f -400
h -400
i -200
j k l -300
k l -200
l -300
m -300
It seems like using ifelse(!is.na())
might be able to do something here, but I'm not sure how to match the ID data in columns rep1 to rep4 to the corresponding row in the ID column, identifying what data in "value" is supposed to be used in the replacement. Can this be done in the same dataframe, or does it need to be split into two different dataframes to work?
Here is the data using dput()
structure(list(ID = structure(1:13, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"),
rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L,
5L, 1L, 1L), .Label = c("", "a", "f", "k", "l"), class = "factor"),
rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
1L, 1L, 1L), .Label = c("", "b", "l"), class = "factor"),
rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"d"), class = "factor"), value = c(-400L, -300L, -200L, -300L,
-400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L, -300L
)), class = "data.frame", row.names = c(NA, -13L))
回答1:
Here a variant with tidyverse:
df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])
Explanation:
mutate_at
allows to select a range of variables to be modified- the
~ ... .
(quosure style lambda notation) allows to use an expression in which.
(dot) stands for the column to be modified. Otherwise you would have to usefunction(x) df$value[match(x, df$ID)]
, which is a lot to type. vars()
are necessary inmutate_at
to be able to select columns without quotes (otherwise you would need to use2:5
orpaste0("rep", 1:4)
).
回答2:
A base R way would be to identify names of the column which we want to match (here rep
), then unlist
them and match
with ID
and replace them with corresponding value
.
cols <- grep("^rep", names(df))
df[cols] <- df$value[match(unlist(df[cols]), df$ID)]
df
# ID rep1 rep2 rep3 rep4 value
#1 a NA NA NA NA -400
#2 b -400 NA NA NA -300
#3 c -400 -300 NA NA -200
#4 d -400 -300 -200 NA -300
#5 e -400 -300 -200 -300 -400
#6 f NA NA NA NA -400
#7 g -400 NA NA NA -400
#8 h NA NA NA NA -400
#9 i NA NA NA NA -200
#10 j -200 -300 NA NA -300
#11 k -300 NA NA NA -200
#12 l NA NA NA NA -300
#13 m NA NA NA NA -300
data
df <- structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m"), rep1 = c(NA, "a", "a", "a", "a", NA,
"f", NA, NA, "k", "l", NA, NA), rep2 = c(NA, NA, "b", "b", "b",
NA, NA, NA, NA, "l", NA, NA, NA), rep3 = c(NA, NA, NA, "c", "c",
NA, NA, NA, NA, NA, NA, NA, NA), rep4 = c(NA, NA, NA, "MA", "d",
NA, NA, NA, NA, NA, NA, NA, NA), value = c(-400L, -300L, -200L,
-300L, -400L, -400L, -400L, -400L, -200L, -300L, -200L, -300L,
-300L)), class = "data.frame", row.names = c(NA, -13L))
来源:https://stackoverflow.com/questions/57000883/if-data-present-replace-with-data-from-another-column-based-on-row-id