Complex Grouping of dataframe with operations and creation of new columns

强颜欢笑 提交于 2019-12-11 18:25:16

问题


I have a question and was not able to find a good answer which I can apply. It seems to be more complex than I thought:

This is my current dataframe df=

[customerid, visit_number, date,        purchase_amount]
[1,          38,           01-01-2019,  40             ]
[1,          39,           01-03-2019,  20             ]
[2,          10,           01-02-2019,  60             ]
[2,          14,           01-05-2019,  0              ]
[3,          10,           01-01-2019,  5              ]

What I am looking for is to aggregate this table where I end up with 1 row per 1 customer and also with additional derived columns from the original like this:

df_new=

[customerid, visits,      days,              purchase_amount]
[1,          2,           3,                 60             ]
[2,          5,           4,                 60             ]
[3,          1,           1,                 5              ]

Note, that if there is no date or visit to compare against for a user, then those metrics will be always 1 (see for customerid=3).

Like I said, I tried looking around for days but I cannot find much help. I hope someone can guide. Thank you very much.


回答1:


You can use groupby.agg:

import datetime
df['date']=pd.to_datetime(df['date'])
g=df.groupby('customerid')
df.index=df['customerid']
df_new=g.agg({'purchase_amount':'sum','visit_number':'diff','date':'diff'})
df_new=df_new.reset_index().sort_values('date').drop_duplicates('customerid').reset_index(drop=True)
df_new['visit_number']=df_new['visit_number']+1
df_new['date']=df_new['date']+pd.Timedelta('1 days')
df_new=df_new.rename(columns={'visit_number':'visits','date':'days'}).reindex(columns=['customerid','visits','days','purchase_amount'])
df_new['visits']=df_new['visits'].fillna(1)
df_new['days']=df_new['days'].fillna(pd.Timedelta('1 days'))
print(df_new)


     customerid  visits   days  purchase_amount
0           1     2.0   3 days               60
1           2     5.0   4 days               60
2           3     1.0   1 days                5

Alternative solution:

import datetime
df['date']=pd.to_datetime(df['date'])
g=df.groupby('customerid')
df.index=df['customerid']
df2=g.agg({'visit_number':'diff','date':'diff'})
df2=df2.loc[df2['visit_number'].notnull()]
df2['visit_number']=df2['visit_number']+1
df2['date']=df2['date']+pd.Timedelta('1 days')
df3=g.agg({'purchase_amount':'sum'})
df_new=pd.concat([df2,df3],sort=False,axis=1).rename(columns={'visit_number':'visits','date':'days'}).reset_index()
df_new['visits']=df_new['visits'].fillna(1)
df_new['days']=df_new['days'].fillna(pd.Timedelta('1 days'))
print(df_new)


来源:https://stackoverflow.com/questions/57897642/complex-grouping-of-dataframe-with-operations-and-creation-of-new-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!