I have a data frame with three string columns. I know that the only one value in the 3rd column is valid for every combination of the first two. To clean the data I have to
If you want another approach for solving it that is does not depend on value_counts
or scipy.stats
you can use the Counter
collection
from collections import Counter
get_most_common = lambda values: max(Counter(values).items(), key = lambda x: x[1])[0]
Which can be applied to the above example like this
src = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
'Short_name' : ['NY','New','Spb','NY']})
src.groupby(['Country','City']).agg(get_most_common)