问题
I have the following dataframe:-
traffic_type date region total_views
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 100
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40
I need to group by traffic_type, date, region, and filter the rows with traffic type total and in the same row create a desktop_share column which is total_views of traffic_type==desktop / total views of the traffic_type ==total the rest of the rows are blank for this column.
traffic_type date region total_views desktop_share
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 200 0.25
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40 0.5
I have a long approach which works but I am looking for something more precise based on numpy or just pandas. My solution:
df1 = df2.loc[df2.traffic_type == 'desktop']
df1 = df1[['date', 'region', 'total_views']]
df1 = df2.merge(df1, how='left', on=['region', 'date'], suffixes=('', '_desktop'))
df1 = df1.loc[df1.traffic_type == 'total']
df1['desktop_share'] = df1['total_views_desktop'] / df1['total_views']
df1 = df1[['date', 'region', 'desktop_share', 'traffic_type']]
dfinal = df2.merge(df1, how='left', on=['region', 'date', 'traffic_type'])
回答1:
One idea with pivoting:
df1 = df.pivot_table(index=['date','region'],
columns='traffic_type',
values='total_views',
aggfunc='sum')
print (df1)
traffic_type desktop mobileweb total
date region
01/04/2018 aug 50 60 200
world 20 30 40
df2 = df1['desktop'].div(df1['total']).reset_index(name='desktop_share').assign(traffic_type='total')
df = df.merge(df2, how='left')
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50
Another idea with MultiIndex
:
df1 = df.set_index(['traffic_type','date','region'])
a = df1.xs('desktop', drop_level=False).rename({'desktop':'total'})
b = df1.xs('total', drop_level=False)
df = df1.assign(desktop_share = a['total_views'].div(b['total_views'])).reset_index()
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50
来源:https://stackoverflow.com/questions/62083576/pandas-row-filters-and-and-division-from-specific-rows-and-columns