pandas - Merge nearly duplicate rows based on column value

后端 未结 3 1215
小鲜肉
小鲜肉 2020-11-27 11:34

I have a pandas dataframe with several rows that are near duplicates of each other, except for one value. My goal is to merge or \"coalesce\" these rows into a

3条回答
  •  北海茫月
    2020-11-27 12:14

    I think you can use groupby with aggregate first and custom function ', '.join:

    df = df.groupby('Name').agg({'Sid':'first', 
                                 'Use_Case': ', '.join, 
                                 'Revenue':'first' }).reset_index()
    
    #change column order                           
    print df[['Name','Sid','Use_Case','Revenue']]                              
      Name   Sid           Use_Case Revenue
    0    A  xx01         Voice, SMS  $10.00
    1    B  xx02              Voice   $5.00
    2    C  xx03  Voice, SMS, Video  $15.00
    

    Nice idea from comment, thanks Goyo:

    df = df.groupby(['Name','Sid','Revenue'])['Use_Case'].apply(', '.join).reset_index()
    
    #change column order                           
    print df[['Name','Sid','Use_Case','Revenue']]                              
      Name   Sid           Use_Case Revenue
    0    A  xx01         Voice, SMS  $10.00
    1    B  xx02              Voice   $5.00
    2    C  xx03  Voice, SMS, Video  $15.00
    

提交回复
热议问题