Seaborn: countplot() with frequencies

匿名 (未验证) 提交于 2019-12-03 03:05:02

问题:

I have a Pandas DataFrame with a column called "AXLES", which can take an integer value between 3-12. I am trying to use Seaborn's countplot() option to achieve the following plot:

  1. left y axis shows the frequencies of these values occurring in the data. The axis extends are [0%-100%], tick marks at every 10%.
  2. right y axis shows the actual counts, values correspond to tick marks determined by the left y axis (marked at every 10%.)
  3. x axis shows the categories for the bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
  4. Annotation on top of the bars show the actual percentage of that category.

The following code gives me the plot below, with actual counts, but I could not find a way to convert them into frequencies. I can get the frequencies using df.AXLES.value_counts()/len(df.index) but I am not sure about how to plug this information into Seaborn's countplot().

I also found a workaround for the annotations, but I am not sure if that is the best implementation.

Any help would be appreciated!

Thanks

plt.figure(figsize=(12,8)) ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12]) plt.title('Distribution of Truck Configurations') plt.xlabel('Number of Axles') plt.ylabel('Frequency [%]')  for p in ax.patches:         ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50)) 

EDIT:

I got closer to what I need with the following code, using Pandas' bar plot, ditching Seaborn. Feels like I'm using so many workarounds, and there has to be an easier way to do it. The issues with this approach:

  • There is no order keyword in Pandas' bar plot function as Seaborn's countplot() has, so I cannot plot all categories from 3-12 as I did in the countplot(). I need to have them shown even if there is no data in that category.
  • The secondary y-axis messes up the bars and the annotation for some reason (see the white gridlines drawn over the text and bars).

    plt.figure(figsize=(12,8)) plt.title('Distribution of Truck Configurations') plt.xlabel('Number of Axles') plt.ylabel('Frequency [%]')  ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0) ax.set_yticks(np.arange(0, 110, 10))  ax2 = ax.twinx() ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100)  for p in ax.patches:     ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1)) 

回答1:

You can do this by making a twinx axes for the frequencies. You can switch the two y axes around so the frequencies stay on the left and the counts on the right, but without having to recalculate the counts axis (here we use tick_left() and tick_right() to move the ticks and set_label_position to move the axis labels

You can then set the ticks using the matplotlib.ticker module, specifically ticker.MultipleLocator and ticker.LinearLocator.

As for your annotations, you can get the x and y locations for all 4 corners of the bar with patch.get_bbox().get_points(). This, along with setting the horizontal and vertical alignment correctly, means you don't need to add any arbitrary offsets to the annotation location.

Finally, you need to turn the grid off for the twinned axis, to prevent grid lines showing up on top of the bars (ax2.grid(None))

Here is a working script:

import pandas as pd import matplotlib.pyplot as plt import numpy as np import seaborn as sns import matplotlib.ticker as ticker  # Some random data dfWIM = pd.DataFrame({'AXLES': np.random.normal(8, 2, 5000).astype(int)}) ncount = len(dfWIM)  plt.figure(figsize=(12,8)) ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12]) plt.title('Distribution of Truck Configurations') plt.xlabel('Number of Axles')  # Make twin axis ax2=ax.twinx()  # Switch so count axis is on right, frequency on left ax2.yaxis.tick_left() ax.yaxis.tick_right()  # Also switch the labels over ax.yaxis.set_label_position('right') ax2.yaxis.set_label_position('left')  ax2.set_ylabel('Frequency [%]')  for p in ax.patches:     x=p.get_bbox().get_points()[:,0]     y=p.get_bbox().get_points()[1,1]     ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y),              ha='center', va='bottom') # set the alignment of the text  # Use a LinearLocator to ensure the correct number of ticks ax.yaxis.set_major_locator(ticker.LinearLocator(11))  # Fix the frequency range to 0-100 ax2.set_ylim(0,100) ax.set_ylim(0,ncount)  # And use a MultipleLocator to ensure a tick spacing of 10 ax2.yaxis.set_major_locator(ticker.MultipleLocator(10))  # Need to turn the grid on ax2 off, otherwise the gridlines end up on top of the bars ax2.grid(None)  plt.savefig('snscounter.pdf') 



回答2:

I got it to work using core matplotlib's bar plot. I didn't have your data obviously, but adapting it to yours should be straight forward.

Approach

I used matplotlib's twin axis and plotted the data as bars on the second Axes object. The rest ist just some fiddeling around to get the ticks right and make annotations.

Hope this helps.

Code

import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib from mpl_toolkits.mplot3d import Axes3D import seaborn as sns  tot = np.random.rand( 1 ) * 100 data = np.random.rand( 1, 12 ) data = data / sum(data,1) * tot  df = pd.DataFrame( data ) palette = sns.husl_palette(9, s=0.7 )  ### Left Axis # Plot nothing here, autmatically scales to second axis.  fig, ax1 = plt.subplots() ax1.set_ylim( [0,100] )  # Remove grid lines. ax1.grid( False ) # Set ticks and add percentage sign. ax1.yaxis.set_ticks( np.arange(0,101,10) ) fmt = '%.0f%%' yticks = matplotlib.ticker.FormatStrFormatter( fmt ) ax1.yaxis.set_major_formatter( yticks )  ### Right Axis # Plot data as bars. x = np.arange(0,9,1) ax2 = ax1.twinx() rects = ax2.bar( x-0.4, np.asarray(df.loc[0,3:]), width=0.8 )  # Set ticks on x-axis and remove grid lines. ax2.set_xlim( [-0.5,8.5] ) ax2.xaxis.set_ticks( x ) ax2.xaxis.grid( False )  # Set ticks on y-axis in 10% steps. ax2.set_ylim( [0,tot] ) ax2.yaxis.set_ticks( np.linspace( 0, tot, 11 ) )  # Add labels and change colors. for i,r in enumerate(rects):     h = r.get_height()     r.set_color( palette[ i % len(palette) ] )     ax2.text( r.get_x() + r.get_width()/2.0, \               h + 0.01*tot,                  \               r'%d%%'%int(100*h/tot), ha = 'center' ) 


回答3:

I think you can first set the y major ticks manually and then modify each label

dfWIM = pd.DataFrame({'AXLES': np.random.randint(3, 10, 1000)}) total = len(dfWIM)*1. plt.figure(figsize=(12,8)) ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12]) plt.title('Distribution of Truck Configurations') plt.xlabel('Number of Axles') plt.ylabel('Frequency [%]')  for p in ax.patches:         ax.annotate('{:.1f}%'.format(100*p.get_height()/total), (p.get_x()+0.1, p.get_height()+5))  #put 11 ticks (therefore 10 steps), from 0 to the total number of rows in the dataframe ax.yaxis.set_ticks(np.linspace(0, total, 11))  #adjust the ticklabel to the desired format, without changing the position of the ticks.  _ = ax.set_yticklabels(map('{:.1f}%'.format, 100*ax.yaxis.get_majorticklocs()/total)) 



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!