How to join and overwrite data appears to be a common request, but I have yet to find an elegant solution that applies to an entire dataset.
(Note: to simplify the d
I think it's easiest to go to long form:
md1 = melt(d2, id="id")
md2 = melt(d2, id="id")
Then you can stack them and take the latest value:
res1 = unique(rbind(md1, md2), by=c("id", "variable"), fromLast=TRUE)
I'd also like to know how this can be done if you only want to update the NA values in [
d3
], that is, make sure existing non-NA values are not overwritten.
You can exclude rows from the update table, md2
, if they appear in md3
:
md3 = melt(d3, id="id")
res3 = unique(rbind(md3, md2[!md3, on=.(id, variable)]),
by=c("id", "variable"), fromLast=TRUE)
dcast
can be used to go back to wide format if necessary, e.g., dcast(res3, id ~ ...)
.