问题
If the df is grouped by A, B, and C, and looks something like this:
A B C D
1 53704 hf 51602
51602
53802
ss 53802
53802
2 12811 hf 54205
hx 50503
I have tried the following, which is similar to something from another post:
df.groupby([df['A'], df['B'], df['C']]).drop_duplicates(cols='D')
This obviously incorrect as it produces an empty dataframe. I've also tried another variation with drop_duplicates that simply deletes all duplicates from 'D', no matter what group it's in. The output I'm looking for is:
A B C D
1 53704 hf 51602
53802
ss 53802
2 12811 hf 54205
hx 50503
So that duplicates are only dropped when they are grouped into the same A/B/C combination.
回答1:
Assuming these are just columns, you can use drop_duplicates directly:
In [11]: df.drop_duplicates(cols=list('ABCD'))
Out[11]:
A B C D
0 1 53704 hf 51602
2 1 53704 hf 53802
3 1 53704 ss 53802
5 2 12811 hf 54205
6 2 12811 hx 50503
If your interested in duplicates of all columns you don't need to specify:
In [12]: df.drop_duplicates()
Out[12]:
A B C D
0 1 53704 hf 51602
2 1 53704 hf 53802
3 1 53704 ss 53802
5 2 12811 hf 54205
6 2 12811 hx 50503
来源:https://stackoverflow.com/questions/19556165/how-can-i-drop-duplicate-data-in-a-single-column-group-wise-in-pandas