问题
I have a dataframe containing 7 columns and would like to records that have same info in the first two columns even they are in reverse order.
Here is a snippet of my df
zip1 zip2 APP PCR SCR APJ PJR
1 01701 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
2 01701 01702 0.9887567 0.9898379 0.9811615 0.9993856 0.9842659
3 01701 01703 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
4 01701 01704 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
5 01704 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
I know how to use unique, but the twist here is that I'd like to treat instances where zip1 = a and zip2 = b the same as zip1 = b and zip2 = a. So I'd essentially want only one records for those two instances. So for example I'd only want column 4 and not column 5 Any advice?
Thanks, Ben
回答1:
First create a new vector which identifies rows with a particular zip pair but doesn't distinguish based upon the ordering:
zipUp<-paste(pmin(df$zip1,df$zip2),pmax(df$zip1,df$zip2))
Now find duplicates in that vector, and discard them from the original data frame.
dups<-duplicated(zipUp)
newdf<-df[!dups,]
I am assuming that the first two columns will not contain NA. If they do you will need to adjust the pmin, pmax calls to keep any non NA value for each pair
来源:https://stackoverflow.com/questions/28047997/deduping-column-pairs-in-r