Python - Delete duplicates in a dataframe based on two columns combinations?

前端 未结 3 1226
名媛妹妹
名媛妹妹 2020-11-29 10:53

I have a dataframe with 3 columns in Python:

Name1 Name2 Value
Juan  Ale   1
Ale   Juan  1

and would like to eliminate the duplicates based

3条回答
  •  无人及你
    2020-11-29 11:38

    You can convert to frozenset and use pd.DataFrame.duplicated.

    res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
    
    print(res)
    
      Name1 Name2  Value
    0  Juan   Ale      1
    

    frozenset is necessary instead of set since duplicated uses hashing to check for duplicates.

    Scales better with columns than rows. For a large number of rows, use @Wen's sort-based algorithm.

提交回复
热议问题