Collapsing rows in a Pandas dataframe

前端 未结 2 873
梦谈多话
梦谈多话 2021-02-09 20:27

I\'m trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, b

2条回答
  •  無奈伤痛
    2021-02-09 20:37

    You can use groupby with aggregation ''.join, sum or max:

    #if blank values are NaN first replace to ''
    df = df.fillna('')
    
    df = df.groupby('ID').agg(''.join)
    print (df)
         apples  pears  oranges
    ID                         
    101                 oranges
    134  apples  pears         
    576          pears  oranges
    837  apples   
    

    Also works:

    df = df.fillna('')
    df = df.groupby('ID').sum()
    #alternatively max
    #df = df.groupby('ID').max()
    print (df)
         apples  pears  oranges
    ID                         
    101                 oranges
    134  apples  pears         
    576          pears  oranges
    837  apples     
    

    Also if need remove duplicates per group and per column add unique:

    df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))
    

提交回复
热议问题