How to find differences in elements of 2 data frames based on 2 unique identifiers

狂风中的少年 提交于 2019-12-13 08:23:34

问题


I have 2 very large data frames similar to the following:

df1<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGR"))

> df1
  DS.ID OP.ID P.ID
1   123  xxab  AAC
2   214  xxac  JGK
3   543  xxad  DIF
4   325  xxae  ADL
5   123  xxaf  AAC
6   214  xxaq  JGR

df2<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGS"))

> df2
  DS.ID OP.ID P.ID
1   123  xxab  AAC
2   214  xxac  JGK
3   543  xxad  DIF
4   325  xxae  ADL
5   123  xxaf  AAC
6   214  xxaq  JGS

The unique id is based on the combination of the DS.ID and the OP.ID, so that DS.ID can be repeated but the combination of DS.ID and OP.ID will not. I want to find the instances where P.ID changes. Also, the combination of DS.ID and OP.ID will not necessarily be in the same row.

In the example above, it would return row 6, as the P.ID changed. I'd want to write both the initial and final values to a data frame.

I have a feeling the initial step would be

rbind.fill(df1,df2)

(.fill because there's added columns in the data frames I'm trying to loop through).

Edit: Assume there's other columns that have different values as well. Thus, duplicated would not work unless you isolated them to their own data frame. But, I'll be doing this for many columns and many data frames, so I'd rather not go with that method for speed sake.


回答1:


If ident is 0 in the following code, then probably, there is difference between two:

ll<-merge(df1,df2,by=c("DS.ID", "OP.ID"))
library(plyr)


 ddply(ll,.(DS.ID, OP.ID),summarize,ident=match(P.ID.x, P.ID.y,nomatch=0))
  DS.ID OP.ID ident
1   123  xxab     1
2   123  xxaf     1
3   214  xxac     1
4   214  xxaq     0
5   325  xxae     1
6   543  xxad     1


来源:https://stackoverflow.com/questions/19913470/how-to-find-differences-in-elements-of-2-data-frames-based-on-2-unique-identifie

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!