GroupBy pandas DataFrame and select most common value

后端 未结 10 1874
梦谈多话
梦谈多话 2020-11-22 07:59

I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to

10条回答
  •  感动是毒
    2020-11-22 08:54

    You can use value_counts() to get a count series, and get the first row:

    import pandas as pd
    
    source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                      'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
                      'Short name' : ['NY','New','Spb','NY']})
    
    source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])
    

    In case you are wondering about performing other agg functions in the .agg() try this.

    # Let's add a new col,  account
    source['account'] = [1,2,3,3]
    
    source.groupby(['Country','City']).agg(mod  = ('Short name', \
                                            lambda x: x.value_counts().index[0]),
                                            avg = ('account', 'mean') \
                                          )
    

提交回复
热议问题