pandas - Merge nearly duplicate rows based on column value

后端未结

关注

 3  1215

小鲜肉 2020-11-27 11:34

I have a pandas dataframe with several rows that are near duplicates of each other, except for one value. My goal is to merge or \"coalesce\" these rows into a

3条回答

北海茫月 (楼主)

2020-11-27 12:14

I think you can use groupby with aggregate first and custom function ', '.join:

df = df.groupby('Name').agg({'Sid':'first', 
                             'Use_Case': ', '.join, 
                             'Revenue':'first' }).reset_index()

#change column order                           
print df[['Name','Sid','Use_Case','Revenue']]                              
  Name   Sid           Use_Case Revenue
0    A  xx01         Voice, SMS  $10.00
1    B  xx02              Voice   $5.00
2    C  xx03  Voice, SMS, Video  $15.00

Nice idea from comment, thanks Goyo:

df = df.groupby(['Name','Sid','Revenue'])['Use_Case'].apply(', '.join).reset_index()

#change column order                           
print df[['Name','Sid','Use_Case','Revenue']]                              
  Name   Sid           Use_Case Revenue
0    A  xx01         Voice, SMS  $10.00
1    B  xx02              Voice   $5.00
2    C  xx03  Voice, SMS, Video  $15.00

0 讨论(0)

查看其它3个回答