Join and overwrite data in one table with data from another table

£可爱£侵袭症+ 提交于 2019-12-01 09:28:01

I think it's easiest to go to long form:

md1 = melt(d2, id="id")
md2 = melt(d2, id="id")

Then you can stack them and take the latest value:

res1 = unique(rbind(md1, md2), by=c("id", "variable"), fromLast=TRUE)

I'd also like to know how this can be done if you only want to update the NA values in [d3], that is, make sure existing non-NA values are not overwritten.

You can exclude rows from the update table, md2, if they appear in md3:

md3 = melt(d3, id="id")

res3 = unique(rbind(md3, md2[!md3, on=.(id, variable)]), 
  by=c("id", "variable"), fromLast=TRUE)   

dcast can be used to go back to wide format if necessary, e.g., dcast(res3, id ~ ...).

Here's @Frank's solution from the comments. (Note: d1 and d2 need to be defined as data.table first).

library(data.table)
cols = setdiff(intersect(names(d1), names(d2)), "id") 
d1[d2, on=.(id), (cols) := mget(paste0("i.", cols))]

As he notes, the original solution I provided below is a bad idea generally speaking. If ids appear multiple times or in a different order, it will do the wrong thing.

d1[d1$id %in% d2$id, names(d2):=d2]

library("dplyr")

d12 <- anti_join(d1, d2, by = "id") %>%
         bind_rows(d2)

This solution takes the rows from d1 that aren't in d2, then adds the d2 rows on to them.

This won't work for the 'Additional scenario', which looks much much messier to resolve, and maybe should be a separate question.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!