问题
I have data which looks like this
Linking <- data.frame(
ID = c(round((runif(20, min = 10000, max = 99999)), digits = 0), NA, NA, NA, NA),
PSU = c(paste("A", round((runif(20, min = 10000, max = 99999)), digits = 0), sep = ''), NA, NA, NA, NA),
qtr = c(rep(1:10, 2), NA, NA, NA, NA)
)
Linking$Key <- paste(Linking$ID, Linking$PSU, Linking$qtr, sep = "_")
Linking$Key[c(21:24)] <- c("87654_A15467_1", "45623_A23456_2", "67891_A12345_4", "65346_A23987_7")
What I want to do is populate the NA values for ID, PSU, and qtr from the information from "Key", but only for the rows with NA values.
Does anyone know how to do this?
This code does what I want, but it does it for all values of each variable. I want to do this just for rows where the values are NA.
Linking2 <- Linking
Linking2$ID <- substr(Linking$Key,1,5)
Linking2$PSU <- substr(Linking$Key,7,12)
Linking2$qtr <- substr(Linking$Key, 14,15)
回答1:
The basic idea here is to assign using a logical index vector.
Linking$ID[is.na(Linking$ID)] <- substr(Linking$Key,1,5)[is.na(Linking$ID)]
回答2:
You can also use tidyr::separate
with dplyr::coalesce
to separate values from Key
and fill NA
values in the first three columns.
library(tidyverse);
Linking %>%
separate(Key, into = paste0("tmp", 1:3), sep = "_") %>%
mutate(ID = coalesce(tmp1), PSU = coalesce(tmp2), qtr = coalesce(tmp3)) %>%
select(-tmp1, -tmp2, -tmp3);
# ID PSU qtr
#1 56421 A20914 1
#2 30912 A97582 2
#3 97547 A73397 3
#4 28277 A35255 4
#5 45107 A14867 5
#6 91099 A26196 6
#7 21117 A69721 7
#8 69065 A34741 8
#9 28781 A96847 9
#10 26098 A93942 10
#11 12680 A15705 1
#12 35231 A68219 2
#13 70556 A47693 3
#14 98281 A55697 4
#15 52687 A95023 5
#16 46129 A95540 6
#17 48721 A78827 7
#18 77989 A36778 8
#19 11757 A29458 9
#20 16575 A78892 10
#21 87654 A15467 1
#22 45623 A23456 2
#23 67891 A12345 4
#24 65346 A23987 7
来源:https://stackoverflow.com/questions/49621647/populate-the-na-values-in-a-variable-with-values-from-a-different-variables-in-r