Pandas groupby on a column of lists

别说谁变了你拦得住时间么 提交于 2019-12-11 16:50:00

问题


I have a pandas dataframe with a column that contains lists:

df = pd.DataFrame({'List': [['once', 'upon'], ['once', 'upon'], ['a', 'time'], ['there', 'was'], ['a', 'time']], 'Count': [2, 3, 4, 1, 2]})

Count   List
2    [once, upon]
3    [once, upon]
4    [a, time]
1    [there, was]
2    [a, time]

How can I combine the List columns and sum the Count columns? The expected result is:

Count   List
5     [once, upon]
6     [a, time]
1     [there, was]

I've tried:

df.groupby('List')['Count'].sum()

which results in:

TypeError: unhashable type: 'list'

回答1:


One way is to convert to tuples first. This is because pandas.groupby requires keys to be hashable. Tuples are immutable and hashable, but lists are not.

res = df.groupby(df['List'].map(tuple))['Count'].sum()

Result:

List
(a, time)       6
(once, upon)    5
(there, was)    1
Name: Count, dtype: int64

If you need the result as lists in a dataframe, you can convert back:

res = df.groupby(df['List'].map(tuple))['Count'].sum()
res['List'] = res['List'].map(list)

#            List  Count
# 0     [a, time]      6
# 1  [once, upon]      5
# 2  [there, was]      1


来源:https://stackoverflow.com/questions/49434712/pandas-groupby-on-a-column-of-lists

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!