GroupBy pandas DataFrame and select most common value

后端 未结 10 1819
梦谈多话
梦谈多话 2020-11-22 07:59

I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to

10条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-22 08:37

    If you want another approach for solving it that is does not depend on value_counts or scipy.stats you can use the Counter collection

    from collections import Counter
    get_most_common = lambda values: max(Counter(values).items(), key = lambda x: x[1])[0]
    

    Which can be applied to the above example like this

    src = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
                  'Short_name' : ['NY','New','Spb','NY']})
    
    src.groupby(['Country','City']).agg(get_most_common)
    

提交回复
热议问题