drop_duplicates not working in pandas?

前端 未结 5 668
抹茶落季
抹茶落季 2020-12-07 01:58

The purpose of my code is to import 2 Excel files, compare them, and print out the differences to a new Excel file.

However, after concatenating all the data, and us

相关标签:
5条回答
  • 2020-12-07 02:43

    I have just had this issue, and this was not the solution.

    It may be in the docs - I admittedly havent looked - and crucially this is only when dealing with date-based unique rows: the 'date' column must be formatted as such.

    If the date data is a pandas object dtype, the drop_duplicates will not work - do a pd.to_datetime first.

    0 讨论(0)
  • 2020-12-07 02:44

    You've got inplace=False so you're not modifying df. You want either

     df.drop_duplicates(subset=None, keep="first", inplace=True)
    

    or

     df = df.drop_duplicates(subset=None, keep="first", inplace=False)
    
    0 讨论(0)
  • 2020-12-07 02:46

    Might help anyone in the future.

    I had a column with dates, where I tried to remove duplicates without success. If it's not important to keep the column as a date for further operations, I converted the column from type object to string.

    df = df.astype('str')
    

    Then I performed @Keith answers

    df = df.drop_duplicates(subset=None, keep="first", inplace=True)
    
    0 讨论(0)
  • 2020-12-07 02:48

    If you have are using a DatetimeIndex in your DataFrame this will not work

    df.drop_duplicates(subset=None, keep="first", inplace=True)
    

    Instead one can use:

    df = df[~df.index.duplicated()]
    
    0 讨论(0)
  • 2020-12-07 02:52

    The use of inplace=False tells pandas to return a new dataframe with duplicates dropped, so you need to assign that back to df:

    df = df.drop_duplicates(subset=None, keep="first", inplace=False)
    

    or inplace=True to tell pandas to drop duplicates in the current dataframe

    df.drop_duplicates(subset=None, keep="first", inplace=True)
    
    0 讨论(0)
提交回复
热议问题