Deduping Column pairs in R

问题

I have a dataframe containing 7 columns and would like to records that have same info in the first two columns even they are in reverse order.

Here is a snippet of my df

 zip1  zip2       APP       PCR       SCR       APJ       PJR
1 01701 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
2 01701 01702 0.9887567 0.9898379 0.9811615 0.9993856 0.9842659
3 01701 01703 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
4 01701 01704 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
5 01704 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

I know how to use unique, but the twist here is that I'd like to treat instances where zip1 = a and zip2 = b the same as zip1 = b and zip2 = a. So I'd essentially want only one records for those two instances. So for example I'd only want column 4 and not column 5 Any advice?

Thanks, Ben

回答1:

First create a new vector which identifies rows with a particular zip pair but doesn't distinguish based upon the ordering:

zipUp<-paste(pmin(df$zip1,df$zip2),pmax(df$zip1,df$zip2))

Now find duplicates in that vector, and discard them from the original data frame.

dups<-duplicated(zipUp)

newdf<-df[!dups,]

I am assuming that the first two columns will not contain NA. If they do you will need to adjust the pmin, pmax calls to keep any non NA value for each pair

来源：https://stackoverflow.com/questions/28047997/deduping-column-pairs-in-r

标签

deduplication

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!