Pandas drop duplicates; values in reverse order

落花浮王杯 提交于 2019-12-11 03:39:07

问题


I'm trying to find a way to utilize pandas drop_duplicates() to recognize that rows are duplicates when the values are in reverse order.

An example is if I am trying to find transactions where customers purchases both apples and bananas, but the data collection order may have reversed the items. In other words, when combined as a full order the transaction is seen as a duplicate because it is made up up of the same items.

I want the following to be recognized as duplicates:

Item1   Item2
Apple   Banana
Banana  Apple

回答1:


First sort by rows with apply sorted and then drop_duplicates:

df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
   Item1   Item2
0  Apple  Banana

#if need specify columns
cols = ['Item1','Item2']
df[cols] = df[cols].apply(sorted, axis=1)
df = df.drop_duplicates(subset=cols)
print (df)
   Item1   Item2
0  Apple  Banana

Another solution with numpy.sort and DataFrame constructor:

df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
       .drop_duplicates()
print (df)
   Item1   Item2
0  Apple  Banana


来源:https://stackoverflow.com/questions/43528573/pandas-drop-duplicates-values-in-reverse-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!