Sorting in a Pandas pivot_table

匿名 (未验证) 提交于 2019-12-03 01:05:01

问题:

I have been looking all over trying to figure out how to sort my pivot table correctly and I haven't had any luck.

    client          unit    task                hours   month 0   A               DVADA   Account Management  6.50    January      1   A               DVADA   Buying              1.25    January  2   A               DVADA   Meeting / Call      0.50    January  3   A               DVADA   Account Management  3.00    January  4   A               DVADA   Billing             2.50    February     5   A               DVADA   Account Management  6.50    February         6   A               DVADA   Buying              1.25    February     7   A               DVADA   Meeting / Call      0.50    February     8   A               DVADA   Account Management  3.00    February     9   A               DVADA   Billing             2.50    February 10  A               DVADA   Billing             2.50    December     11  A               DVADA   Account Management  6.50    December         12  A               DVADA   Buying              1.25    December     13  A               DVADA   Meeting / Call      0.50    December     14  A               DVADA   Account Management  3.00    December     15  A               DVADA   Billing             2.50    December 16  A               DVADA   Account Management  6.50    August       17  A               DVADA   Buying              1.25    August   18  A               DVADA   Meeting / Call      0.50    August   19  A               DVADA   Account Management  3.00    August 20  A               DVADA   Account Management  6.50    April        21  A               DVADA   Buying              1.25    April    22  A               DVADA   Meeting / Call      0.50    April    23  A               DVADA   Account Management  3.00    April 24  B               DVADA   Account Management  6.50    January      25  B               DVADA   Buying              1.25    January  26  B               DVADA   Meeting / Call      0.50    January  27  B               DVADA   Account Management  3.00    January  28  B               DVADA   Billing             2.50    February     29  B               DVADA   Account Management  6.50    February         30  B               DVADA   Buying              1.25    February     31  B               DVADA   Meeting / Call      0.50    February     32  B               DVADA   Account Management  3.00    February     33  B               DVADA   Billing             2.50    February 34  B               DVADA   Billing             2.50    December     35  B               DVADA   Account Management  6.50    December         36  B               DVADA   Buying              1.25    December     37  B               DVADA   Meeting / Call      0.50    December     38  B               DVADA   Account Management  3.00    December     39  B               DVADA   Billing             2.50    December 40  B               DVADA   Account Management  6.50    August       41  B               DVADA   Buying              1.25    August   42  B               DVADA   Meeting / Call      0.50    August   43  B               DVADA   Account Management  3.00    August 44  B               DVADA   Account Management  6.50    April        45  B               DVADA   Buying              1.25    April    46  B               DVADA   Meeting / Call      0.50    April    47  C               DVADA   Account Management  3.00    April 48  C               DVADA   Account Management  6.50    January      49  C               DVADA   Buying              1.25    January  50  C               DVADA   Meeting / Call      0.50    January  51  C               DVADA   Account Management  3.00    January  52  C               DVADA   Billing             2.50    February     53  C               DVADA   Account Management  6.50    February         54  C               DVADA   Buying              1.25    February     55  C               DVADA   Meeting / Call      0.50    February     56  C               DVADA   Account Management  3.00    February     57  C               DVADA   Billing             2.50    February 58  C               DVADA   Billing             2.50    December     59  C               DVADA   Account Management  6.50    December         60  C               DVADA   Buying              1.25    December     61  C               DVADA   Meeting / Call      0.50    December     62  C               DVADA   Account Management  3.00    December     63  C               DVADA   Billing             2.50    December 64  C               DVADA   Account Management  6.50    August       65  C               DVADA   Buying              1.25    August   66  C               DVADA   Meeting / Call      0.50    August   67  C               DVADA   Account Management  3.00    August 68  C               DVADA   Account Management  6.50    April        69  C               DVADA   Buying              1.25    April    70  C               DVADA   Meeting / Call      0.50    April    71  C               DVADA   Account Management  3.00    April 

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

Which returns a pivot table with three columns (client, month, hours). Each client has 12 months (Jan-Dec) and each of those months has a hours for that month.

                        hours client          month  A               April   203.50                 August  227.75                 December 159.75                 February 203.25                 January 199.25  B               April   203.50                 August  227.75                 December 159.75                 February 203.25                 January 199.25  C               April   203.50                 August  227.75                 December 159.75                 February 203.25                 January 199.25 

I want to sort this pivot table by the months but keep the client column in tacked.

                           hours client           month  A               January 203.50                 February 227.75                 March    159.75                 April    203.25                 May     199.90  B               January 203.50                 February 227.75                 March    159.75                 April    203.25                 May     199.90  C               January 203.50                 February 227.75                 March    159.75                 April    203.25                 May     199.90 

The sorting issue is fixed with the answer below from Scott. Now I want to add a row to each client with the total hours used.

                           hours client           month  A               January    203.50                 February   227.75                 March      159.75                 April      203.25                 May        199.90                 Total     1000.34  B               January    203.50                 February   227.75                 March      159.75                 April      203.25                 May       199.90                 Total     1000.34  C               January   203.50                 February   227.75                 March      159.75                 April      203.25                 May       199.90                 Total     1000.34 

Any help will be greatly appreciated

回答1:

Update to add Total at end of each client

vp_clients['month'] = pd.Categorical(vp_clients['month'],                                       ordered=True,                                       categories=['January','February','March',                                                  'April','May','June','July',                                                  'August','September','October',                                                  'November','December','Total'])  df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)  df = df.dropna()  pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index() 

Output:

                 hours client month           A      January   11.25        February  16.25        April     11.25        August    11.25        December  16.25        Total     66.25 B      January   11.25        February  16.25        April      8.25        August    11.25        December  16.25        Total     63.25 C      January   11.25        February  16.25        April     14.25        August    11.25        December  16.25        Total     69.25 

Let's use pd.Categorical:

vp_clients['month'] = pd.Categorical(vp_clients['month'],                                       ordered=True,                                       categories=['January','February','March',                                                  'April','May','June','July',                                                  'August','September','October',                                                  'November','December'])  df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)  df.dropna() 

Output:

                 hours client month           A      January   11.25        February  16.25        April     11.25        August    11.25        December  16.25 B      January   11.25        February  16.25        April      8.25        August    11.25        December  16.25 C      January   11.25        February  16.25        April     14.25        August    11.25        December  16.25 


回答2:

Additionally, as commented, since you are not pivoting values to new columns in a wide format, consider simply using groupby(). And re-consider reindex() for custom January-December order, specifying the level and interfacing with python's built-in calendar module.

import calendar ...  grp_df = df.groupby(['client', 'month']).agg({'hours': 'sum'})\            .reindex(level=1, labels=calendar.month_name)  #                  hours # client month           # A      January   11.25 #        February  16.25 #        April     11.25 #        August    11.25 #        December  16.25 # B      January   11.25 #        February  16.25 #        April      8.25 #        August    11.25 #        December  16.25 # C      January   11.25 #        February  16.25 #        April     14.25 #        August    11.25 #        December  16.25 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!