How to drop duplicates based on two or more subsets criteria in Pandas data-frame

前端 未结 1 721
Happy的楠姐
Happy的楠姐 2020-12-06 20:24

Lets say this is my data-frame

df = pd.DataFrame({ \'bio\' : [\'1\', \'1\', \'1\', \'4\'],
                \'center\' : [\'one\', \'one\', \'two\', \'three\'         


        
相关标签:
1条回答
  • 2020-12-06 21:03

    Your syntax is wrong. Here's the correct way:

    df.drop_duplicates(subset=['bio', 'center', 'outcome'])
    

    Or in this specific case, just simply:

    df.drop_duplicates()
    

    Both return the following:

      bio center outcome
    0   1    one       f
    2   1    two       f
    3   4  three       f
    

    Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column labels.

    0 讨论(0)
提交回复
热议问题