drop_duplicates not working in pandas?

前端未结

关注

 5  676

抹茶落季

The purpose of my code is to import 2 Excel files, compare them, and print out the differences to a new Excel file.

However, after concatenating all the data, and us

相关标签:

5条回答

一向

2020-12-07 02:43

I have just had this issue, and this was not the solution.

It may be in the docs - I admittedly havent looked - and crucially this is only when dealing with date-based unique rows: the 'date' column must be formatted as such.

If the date data is a pandas object dtype, the drop_duplicates will not work - do a pd.to_datetime first.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2020-12-07 02:44
You've got inplace=False so you're not modifying df. You want either
```
 df.drop_duplicates(subset=None, keep="first", inplace=True)
```
or
```
 df = df.drop_duplicates(subset=None, keep="first", inplace=False)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-07 02:46
Might help anyone in the future.

I had a column with dates, where I tried to remove duplicates without success. If it's not important to keep the column as a date for further operations, I converted the column from type object to string.
```
df = df.astype('str')
```
Then I performed @Keith answers
```
df = df.drop_duplicates(subset=None, keep="first", inplace=True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-12-07 02:48
If you have are using a DatetimeIndex in your DataFrame this will not work
```
df.drop_duplicates(subset=None, keep="first", inplace=True)
```
Instead one can use:
```
df = df[~df.index.duplicated()]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-12-07 02:52
The use of inplace=False tells pandas to return a new dataframe with duplicates dropped, so you need to assign that back to df:
```
df = df.drop_duplicates(subset=None, keep="first", inplace=False)
```
or inplace=True to tell pandas to drop duplicates in the current dataframe
```
df.drop_duplicates(subset=None, keep="first", inplace=True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...