Pandas row filters and and division from specific rows and columns

巧了我就是萌 提交于 2020-06-05 11:37:32

问题


I have the following dataframe:-

traffic_type    date        region   total_views
desktop         01/04/2018  aug      50
mobileweb       01/04/2018  aug      60
total           01/04/2018  aug      100
desktop         01/04/2018  world    20
mobileweb       01/04/2018  world    30
total           01/04/2018  world    40

I need to group by traffic_type, date, region, and filter the rows with traffic type total and in the same row create a desktop_share column which is total_views of traffic_type==desktop / total views of the traffic_type ==total the rest of the rows are blank for this column.

 traffic_type    date        region   total_views desktop_share
desktop         01/04/2018  aug      50           
mobileweb       01/04/2018  aug      60
total           01/04/2018  aug      200          0.25
desktop         01/04/2018  world    20
mobileweb       01/04/2018  world    30
total           01/04/2018  world    40           0.5

I have a long approach which works but I am looking for something more precise based on numpy or just pandas. My solution:

df1 = df2.loc[df2.traffic_type == 'desktop']
df1 = df1[['date', 'region', 'total_views']]
df1 = df2.merge(df1, how='left', on=['region', 'date'], suffixes=('', '_desktop'))
df1 = df1.loc[df1.traffic_type == 'total']
df1['desktop_share'] = df1['total_views_desktop'] / df1['total_views']
df1 = df1[['date', 'region', 'desktop_share', 'traffic_type']]

dfinal = df2.merge(df1, how='left', on=['region', 'date', 'traffic_type'])

回答1:


One idea with pivoting:

df1 = df.pivot_table(index=['date','region'], 
                     columns='traffic_type', 
                     values='total_views', 
                     aggfunc='sum')
print (df1)
traffic_type       desktop  mobileweb  total
date       region                           
01/04/2018 aug          50         60    200
           world        20         30     40

df2 = df1['desktop'].div(df1['total']).reset_index(name='desktop_share').assign(traffic_type='total')

df = df.merge(df2, how='left')
print (df)
  traffic_type        date region  total_views  desktop_share
0      desktop  01/04/2018    aug           50            NaN
1    mobileweb  01/04/2018    aug           60            NaN
2        total  01/04/2018    aug          200           0.25
3      desktop  01/04/2018  world           20            NaN
4    mobileweb  01/04/2018  world           30            NaN
5        total  01/04/2018  world           40           0.50

Another idea with MultiIndex:

df1 = df.set_index(['traffic_type','date','region'])

a = df1.xs('desktop', drop_level=False).rename({'desktop':'total'})
b = df1.xs('total', drop_level=False)

df = df1.assign(desktop_share = a['total_views'].div(b['total_views'])).reset_index()
print (df)
  traffic_type        date region  total_views  desktop_share
0      desktop  01/04/2018    aug           50            NaN
1    mobileweb  01/04/2018    aug           60            NaN
2        total  01/04/2018    aug          200           0.25
3      desktop  01/04/2018  world           20            NaN
4    mobileweb  01/04/2018  world           30            NaN
5        total  01/04/2018  world           40           0.50


来源:https://stackoverflow.com/questions/62083576/pandas-row-filters-and-and-division-from-specific-rows-and-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!