Stacked bar plots from list of dataframes with groupby command

穿精又带淫゛_ 提交于 2019-12-11 16:56:16

问题


I wish to create a (2x3) stacked barchart subplot from results using a groupby.size command, let me explain. I have a list of dataframes: list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]. A small example of these df's would be:

...     Create Time          Location       Area Id     Beat    Priority    ... Closed Time

    2011-01-01 00:00:00    ST&SAN PABLO AV    1.0        06X      1.0   ... 2011-01-01 00:28:17

    2011-01-01 00:01:11    ST&HANNAH ST       1.0        07X      1.0   ... 2011-01-01 01:12:56
             .
             .
             .

(can only add a few columns as the layout messes up) I'm using a groupby.size command to get a required count of events for these databases, see below:

list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
for i in list_df:
    print(i.groupby(['Beat', 'Priority']).size())
    print(' ')

Producing:

Beat  Priority
01X   1.0          394
      2.0         1816
02X   1.0          644
      2.0         1970
02Y   1.0          661
      2.0         2309
03X   1.0          857
      2.0         2962
.
.
.

I wish to identify which is the top 10 TOTALS using the beat column. So for e.g. the totals above are:

Beat  Priority           Total for Beat
01X   1.0       394         
      2.0       1816         2210
02Y   1.0       661          
      2.0       2309         2970
03X   1.0       857
      2.0       2962         3819
.
.
.

So far I have used plot over my groupby.size but it hasn't done the collective total as I described above. Check out below:

list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
fig, axes = plt.subplots(2, 3)
for d, i in zip(list_df, range(6)):
    ax = axes.ravel()[i];
    d.groupby(['Beat', 'Priority']).size().nlargest(10).plot(ax=ax, kind='bar', figsize=(15, 7), stacked=True, legend=True)
    ax.set_title(f"Top 10 Beats for {i+ 2011}")
    plt.tight_layout()

I wish to have the 2x3 subplot layout, but with stacked barcharts like this one I have done previously:

Thanks in advance. This has been harder than I thought it would be!


回答1:


The data series need to be the columns, so you probably want

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# create fake input data
ncols = 300
list_df = [pd.DataFrame({'Beat': np.random.choice(['{:02d}X'.format(i) for i in range(15)], ncols),
                         'Priority': np.random.choice(['1', '2'], ncols), 
                         'othercolumn1': range(ncols), 
                         'othercol2': range(ncols), 
                         'year': [yr] * ncols}) for yr in range(2011, 2017)]                                                                     

In [22]: print(list_df[0].head(5))
  Beat Priority  othercolumn1  othercol2  year
0  06X        1             0          0  2011
1  05X        1             1          1  2011
2  04X        1             2          2  2011
3  01X        2             3          3  2011
4  00X        1             4          4  2011

fig, axes = plt.subplots(2, 3)   

for i, d in enumerate(list_df):
    ax = axes.flatten()[i]
    dplot = d[['Beat', 'Priority']].pivot_table(index='Beat', columns='Priority', aggfunc=len)
    dplot = (dplot.assign(total=lambda x: x.sum(axis=1))
                  .sort_values('total', ascending=False)
                  .head(10)
                  .drop('total', axis=1))
    dplot.plot.bar(ax=ax, figsize=(15, 7), stacked=True, legend=True)



来源:https://stackoverflow.com/questions/58864458/stacked-bar-plots-from-list-of-dataframes-with-groupby-command

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!