发表新帖

发表新帖

Python - Delete duplicates in a dataframe based on two columns combinations?

前端未结

关注

 3  1226

名媛妹妹 2020-11-29 10:53

I have a dataframe with 3 columns in Python:

Name1 Name2 Value
Juan  Ale   1
Ale   Juan  1

and would like to eliminate the duplicates based

3条回答

无人及你 (楼主)

2020-11-29 11:38
You can convert to frozenset and use pd.DataFrame.duplicated.
```
res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]

print(res)

  Name1 Name2  Value
0  Juan   Ale      1
```
frozenset is necessary instead of set since duplicated uses hashing to check for duplicates.

Scales better with columns than rows. For a large number of rows, use @Wen's sort-based algorithm.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题