问题
I have two data.table(dt1 & dt2). dt1 is past product data and dt2 is present product data. I want to create a third data.table that inserts new rows from dt2 into dt1 only when product characteristics(Level or Color) are different or Product itself is different.
library(data.table)
dt1 <- fread('
    Product  Level   Color     ReviewDate
    A          0     Blue      9/7/2016
    B          1     Red       9/7/2016
    C          1     Purple    9/7/2016 
    D          2     Blue      9/7/2016
    E          1     Green     9/7/2016 
    F          4     Yellow    9/7/2016  
')
dt2 <- fread('
    Product  Level   Color     ReviewDate
    A          1     Black     9/8/2016
    B          1     Red       9/8/2016
    C          5     White     9/8/2016 
    D          2     Blue      9/8/2016
    E          1     Green     9/8/2016 
    F          4     Yellow    9/8/2016 
    G          3     Orange    9/8/2016 
')
My final data.table(dt3) should have the following changes:A and C are both different in dt2 than dt1, thats why the new(different) rows from dt2 gets inserted into the final table alongside all rows from dt1. G is a totally new product that was not in dt1, thats why it makes it into the final table.
    Product  Level   Color     ReviewDate
    A          0     Blue      9/7/2016
    A          1     Black     9/8/2016
    B          1     Red       9/7/2016
    C          1     Purple    9/7/2016 
    C          5     White     9/8/2016
    D          2     Blue      9/7/2016
    E          1     Green     9/7/2016 
    F          4     Yellow    9/7/2016
    G          3     Orange    9/8/2016 
I have tried:
setkey(dt1, Product)
setkey(dt2, Product)
dt3<- dt1[dt2]
setkey(dt3,Product,ReviewDate)
回答1:
You can stack and uniqify:
unique(rbind(dt1, dt2), by=c("Product", "Level", "Color"))
回答2:
Another alternative is to only rbind the subset of the data which is different (avoids the creation of one big data.table which contains dt1 and dt2)
dt3 <- rbind(dt1, setDT(dt2)[!dt1, on=c("Product", "Level", "Color")])
dt3[order(Product, ReviewDate),]
回答3:
Using merge...
d<-merge(dt1, dt2, by=c("Product","Level","Color"), all.x=T,all.y=TRUE)
d$ReviewDate <-ifelse(is.na(d$ReviewDate.x), d$ReviewDate.y, d$ReviewDate.x)
as.data.frame(select(d, 1,2,3,6))
   Product Level  Color ReviewDate
1       A     0   Blue   9/7/2016
2       A     1  Black   9/8/2016
3       B     1    Red   9/7/2016
4       C     1 Purple   9/7/2016
5       C     5  White   9/8/2016
6       D     2   Blue   9/7/2016
7       E     1  Green   9/7/2016
8       F     4 Yellow   9/7/2016
9       G     3 Orange   9/8/2016
来源:https://stackoverflow.com/questions/39398135/updating-data-table-by-inserting-new-rows-that-are-different-from-old-rows