The purpose of my code is to import 2 Excel files, compare them, and print out the differences to a new Excel file.
However, after concatenating all the data, and us
I have just had this issue, and this was not the solution.
It may be in the docs - I admittedly havent looked - and crucially this is only when dealing with date-based unique rows: the 'date' column must be formatted as such.
If the date
data is a pandas object dtype, the drop_duplicates
will not work - do a pd.to_datetime
first.
You've got inplace=False
so you're not modifying df
. You want either
df.drop_duplicates(subset=None, keep="first", inplace=True)
or
df = df.drop_duplicates(subset=None, keep="first", inplace=False)
Might help anyone in the future.
I had a column with dates, where I tried to remove duplicates without success. If it's not important to keep the column as a date for further operations, I converted the column from type object to string.
df = df.astype('str')
Then I performed @Keith answers
df = df.drop_duplicates(subset=None, keep="first", inplace=True)
If you have are using a DatetimeIndex in your DataFrame this will not work
df.drop_duplicates(subset=None, keep="first", inplace=True)
Instead one can use:
df = df[~df.index.duplicated()]
The use of inplace=False
tells pandas to return a new dataframe with duplicates dropped, so you need to assign that back to df
:
df = df.drop_duplicates(subset=None, keep="first", inplace=False)
or inplace=True
to tell pandas to drop duplicates in the current dataframe
df.drop_duplicates(subset=None, keep="first", inplace=True)