How to conditionally remove duplicates from a pandas dataframe

后端 未结 3 2115
梦如初夏
梦如初夏 2021-01-02 02:41

Consider the following dataframe

import pandas as pd
df = pd.DataFrame({\'A\' : [1, 2, 3, 3, 4, 4, 5, 6, 7],
                   \'B\' : [\'a\',\'b\',\'c\',\'         


        
3条回答
  •  半阙折子戏
    2021-01-02 03:01

    If the goal is to only drop the NaN duplicates, a slightly more involved solution is needed.

    First, sort on A, B, and Col_1, so NaNs are moved to the bottom for each group. Then call df.drop_duplicates with keep=first:

    out = df.sort_values(['A', 'B', 'Col_1']).drop_duplicates(['A', 'B'], keep='first')
    print(out)
    
       A  B Col_1  Col_2
    0  1  a   NaN      2
    1  2  b     A      2
    2  3  c     A      3
    4  4  d     B      3
    6  5  e     B      4
    7  6  f   NaN      4
    8  7  g   NaN      5
    

提交回复
热议问题