问题
First, let's start with DataFrame 1 (DF1) :
DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
I also have DataFrame 2 (DF2) :
DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
c(1, 1, 2, 2),
c(9999, 8888, 777, 555),
c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
For every rows in DF1, I have to look if there is a row in DF2 that has the same date and id. If yes, I have to replace the values in DF1 by the values in DF2.
DF2 will always have less columns than DF1. If a column is not in DF2, I must keep the original value that was in DF1 for that specific column.
The final output would like this:
results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
Do you have any suggestions?
回答1:
You could use the join functionality of the data.table-package for this:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
which gives:
> DF1 date id sales cost city 1: 06/19/2016 1 9999 101 LON 2: 06/20/2016 1 150 102 MTL 3: 06/21/2016 1 151 104 MTL 4: 06/22/2016 1 152 107 MTL 5: 06/23/2016 1 155 99 MTL 6: 06/19/2016 2 84 55 NY 7: 06/20/2016 2 83 55 NY 8: 06/21/2016 2 80 56 NY 9: 06/22/2016 2 777 57 QC 10: 06/23/2016 2 555 58 QC
When you have many columns in both datasets, it is easier to use mget
instead off typing all the column names. For the used data in the question it would look like:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
回答2:
df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)
tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"
tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"
results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)

回答3:
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
Replace column
with your variable.
来源:https://stackoverflow.com/questions/37993924/replace-specific-values-based-on-another-dataframe