How can I drop duplicate data in a single column, group-wise in pandas?

问题

If the df is grouped by A, B, and C, and looks something like this:

    A    B      C    D
    1    53704  hf   51602
                     51602   
                     53802
                ss   53802
                     53802
    2    12811  hf   54205
                hx   50503

I have tried the following, which is similar to something from another post:

    df.groupby([df['A'], df['B'], df['C']]).drop_duplicates(cols='D')

This obviously incorrect as it produces an empty dataframe. I've also tried another variation with drop_duplicates that simply deletes all duplicates from 'D', no matter what group it's in. The output I'm looking for is:

    A    B      C   D
    1    53704  hf  51602
                    53802
                ss  53802
    2    12811  hf  54205
                hx  50503

So that duplicates are only dropped when they are grouped into the same A/B/C combination.

回答1:

Assuming these are just columns, you can use drop_duplicates directly:

In [11]: df.drop_duplicates(cols=list('ABCD'))
Out[11]: 
   A      B   C      D
0  1  53704  hf  51602
2  1  53704  hf  53802
3  1  53704  ss  53802
5  2  12811  hf  54205
6  2  12811  hx  50503

If your interested in duplicates of all columns you don't need to specify:

In [12]: df.drop_duplicates()
Out[12]: 
   A      B   C      D
0  1  53704  hf  51602
2  1  53704  hf  53802
3  1  53704  ss  53802
5  2  12811  hf  54205
6  2  12811  hx  50503

来源：https://stackoverflow.com/questions/19556165/how-can-i-drop-duplicate-data-in-a-single-column-group-wise-in-pandas

标签

python

group-by

pandas

duplicates

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!