Sum & count of a column based on the content of the last value in each group after group-by

六眼飞鱼酱① 提交于 2019-12-24 06:49:13

问题


I have a dataframe as below

id       val    type
aa         0    C
aa         1    T
aa         2    T
aa         3    T
aa         0    M
aa         1    M
aa         2    C
aa         3    M
bbb        0    C
bbb        1    T
bbb        2    T
bbb        3    T
bbb        0    M
bbb        1    M
bbb        2    C
bbb        3    T
cccccc     0    C
cccccc     1    T
cccccc     2    T
cccccc     3    T
cccccc     0    M
cccccc     1    M
cccccc     0    C
cccccc     1    C

I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group.

For example the last row of group has "type" M, so only the rows with "type" M in the group be summed & counted. So, values 0,1 & 3 needs to be summed & the count is 3.

The expected output for the df above is as below. Column "type" in the output is not mandatory, it can be left out if it takes more time to get that. I am showing it here just o give clarity on what i want to achieve.

id     val  count   type
aa       4  3       M
bbb      9  4       T
cccccc   1  3       C


回答1:


Use GroupBy.transform with last for filtering and then aggregate by named aggregations, working for pandas 0.25+:

df = (df[df['type'].eq(df.groupby('id')['type'].transform('last'))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C

Another solution with Series.map by Series created by DataFrame.drop_duplicates and DataFrame.set_index:

s = df.drop_duplicates('id', keep='last').set_index('id')['type']
df = (df[df['type'].eq(df['id'].map(s))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C



回答2:


# find the last type by group
last_type = df[["id", "type"]].groupby("id").tail(1)


df['count'] = 1

# inner merge with the last type (i.e. keep only the last type by group)
df.merge(last_type, on=["id", "type"], how="inner").groupby(["id", "type"]).agg({'val':'sum', 'count': 'sum'}).reset_index()

Output

       id type  val  count
0      aa    M    4      3
1     bbb    T    9      4
2  cccccc    C    1      3


来源:https://stackoverflow.com/questions/59388451/sum-count-of-a-column-based-on-the-content-of-the-last-value-in-each-group-aft

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!