问题
I'm new to DataFrames and I want to group multiple columns and then sum and keep a count on the last column. e.g.
s = pd.DataFrame(np.matrix([[1, 2,3,4], [3, 4,7,6],[3,4,5,6],[1,2,3,7]]), columns=['a', 'b', 'c', 'd'])
a b c d
0 1 2 3 4
1 3 4 7 6
2 3 4 5 6
3 1 2 3 7
I want to group on a, b and c but then sum on d and count the elements within the group.
I can count by
s = s.groupby(by=["a", "b", "c"])["d"].count()
a b c
1 2 3 2
3 4 5 1
7 1
And I can sum by
s = s.groupby(by=["a", "b", "c"])["d"].sum()
a b c
1 2 3 11
3 4 5 6
7 6
However I want to combine it such that The resulting dataframe has both the sum and count columns.
a b c sum count
1 2 3 11 2
3 4 5 6 1
7 6 1
回答1:
You can use aggregate, or shorter version agg:
print (s.groupby(by=["a", "b", "c"])["d"].agg([sum, 'count']))
#print (s.groupby(by=["a", "b", "c"])["d"].aggregate([sum, 'count']))
sum count
a b c
1 2 3 11 2
3 4 5 6 1
7 6 1
Pandas documentation.
The difference between size and count is:
size counts NaN values, count does not.
If need count NaN values also:
print (s.groupby(by=["a", "b", "c"])["d"].agg([sum, 'size']))
sum size
a b c
1 2 3 11 2
3 4 5 6 1
7 6 1
来源:https://stackoverflow.com/questions/39409180/create-two-aggregate-columns-by-group-by-pandas