Create two aggregate columns by Group By Pandas

老子叫甜甜 提交于 2019-12-07 17:55:24

问题


I'm new to DataFrames and I want to group multiple columns and then sum and keep a count on the last column. e.g.

s = pd.DataFrame(np.matrix([[1, 2,3,4], [3, 4,7,6],[3,4,5,6],[1,2,3,7]]), columns=['a', 'b', 'c', 'd'])

   a  b  c  d
0  1  2  3  4
1  3  4  7  6
2  3  4  5  6
3  1  2  3  7

I want to group on a, b and c but then sum on d and count the elements within the group. I can count by

s = s.groupby(by=["a", "b", "c"])["d"].count()

    a  b  c
    1  2  3    2
    3  4  5    1
          7    1

And I can sum by

s = s.groupby(by=["a", "b", "c"])["d"].sum()

a  b  c
1  2  3    11
3  4  5     6
      7     6

However I want to combine it such that The resulting dataframe has both the sum and count columns.

    a  b  c   sum    count
    1  2  3    11     2
    3  4  5     6     1
          7     6     1

回答1:


You can use aggregate, or shorter version agg:

print (s.groupby(by=["a", "b", "c"])["d"].agg([sum, 'count']))
#print (s.groupby(by=["a", "b", "c"])["d"].aggregate([sum, 'count']))
       sum  count
a b c            
1 2 3   11      2
3 4 5    6      1
    7    6      1

Pandas documentation.

The difference between size and count is:

size counts NaN values, count does not.

If need count NaN values also:

print (s.groupby(by=["a", "b", "c"])["d"].agg([sum, 'size']))
       sum  size
a b c           
1 2 3   11     2
3 4 5    6     1
    7    6     1


来源:https://stackoverflow.com/questions/39409180/create-two-aggregate-columns-by-group-by-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!