DataFrame.drop_duplicates and DataFrame.drop not removing rows

后端未结

关注

 2  1593

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the data

相关标签:

2条回答

爱一瞬间的悲伤

2020-12-09 20:50

In my case the issue was that I was concatenating dfs with columns of different types:

import pandas as pd

s1 = pd.DataFrame([['a', 1]], columns=['letter', 'code'])
s2 = pd.DataFrame([['a', '1']], columns=['letter', 'code'])
df = pd.concat([s1, s2])
df = df.reset_index(drop=True)
df.drop_duplicates(inplace=True)

# 2 rows
print(df)

# int
print(type(df.at[0, 'code']))
# string
print(type(df.at[1, 'code']))

# Fix:
df['code'] = df['code'].astype(str)
df.drop_duplicates(inplace=True)

# 1 row
print(df)

0 讨论(0)

太阳男子

2020-12-09 21:09
As mentioned in the comments, drop and drop_duplicates creates a new DataFrame, unless provided with an inplace argument. All these options would work:
```
df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...