问题
In a Pandas df, I am trying to drop duplicates across multiple columns. Lots of data per row is NaN
.
This is only an example, the data is a mixed bag, so many different combinations exist.
df.drop_duplicates()
IDnum name formNumber
1 NaN AP GROUP 028-11964
2 1364615.0 AP GROUP NaN
3 NaN AP GROUP NaN
Hopeful Output:
IDnum name formNumber
1 1364615.0 AP GROUP 028-11964
EDIT:
If the df.drop_duplicates()
looks like this, would it change the solution? :
df.drop_duplicates()
IDnum name formNumber
0 NaN AP GROUP 028-11964
1 1364615.0 AP GROUP 028-11964
2 1364615.0 AP GROUP NaN
3 NaN AP GROUP NaN
回答1:
You can using groupby
+ first
df.groupby('name',as_index=False).first()
Out[206]:
name IDnum formNumber
0 APGROUP 1364615.0 028-11964
回答2:
You need:
df.bfill().ffill().drop_duplicates()
Output:
IDnum name formNumber
0 1364615.0 AP GROUP 028-11964
来源:https://stackoverflow.com/questions/51217780/pandas-drop-duplicates-ignoring-nan