How to make matplotlib/pandas bar chart look like hist chart?

后端 未结 2 834
Happy的楠姐
Happy的楠姐 2020-12-18 08:10

Plotting Differences between bar and hist

Given some data in a pandas.Series , rv, there is a difference between

相关标签:
2条回答
  • 2020-12-18 08:37

    Bar plotting differences

    Obtaining a bar plot that looks like the hist plot requires some manipulating of default behavior for bar.

    1. Force bar to use actual x data for plotting range by passing both x (hist.index) and y (hist.values). The default bar behavior is to plot the y data against an arbitrary range and put the x data as the label.
    2. Set the width parameter to be related to actual step size of x data (The default is 0.8)
    3. Set the align parameter to 'center'.
    4. Manually set the axis legend.

    These changes need to be made via matplotlib's bar() called on the axis (ax) instead of pandas's bar() called on the data (hist).

    Example Plotting

    %matplotlib inline
    
    import numpy as np
    import pandas as pd
    import scipy.stats as stats
    import matplotlib
    matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
    matplotlib.style.use('ggplot')
    
    # Setup size and distribution
    size = 50000
    distribution = stats.norm()
    
    # Create random data
    rv = pd.Series(distribution.rvs(size=size))
    # Get sane start and end points of distribution
    start = distribution.ppf(0.01)
    end = distribution.ppf(0.99)
    
    # Build PDF and turn into pandas Series
    x = np.linspace(start, end, size)
    y = distribution.pdf(x)
    pdf = pd.Series(y, x)
    
    # Get histogram of random data
    y, x = np.histogram(rv, bins=50, normed=True)
    # Correct bin edge placement
    x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
    hist = pd.Series(y, x)
    
    # Plot previously histogrammed data
    ax = pdf.plot(lw=2, label='PDF', legend=True)
    w = abs(hist.index[1]) - abs(hist.index[0])
    ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
    ax.legend(['PDF', 'Random Samples'])
    

    0 讨论(0)
  • 2020-12-18 08:45

    Another, simpler solution is to create fake samples that reproduce the same histogram and then simply use hist().

    I.e., after retrieving bins and counts from stored data, do

    fake = np.array([])
    for i in range(len(counts)):
        a, b = bins[i], bins[i+1]
        sample = a + (b-a)*np.random.rand(counts[i])
        fake = np.append(fake, sample)
    
    plt.hist(fake, bins=bins)
    
    0 讨论(0)
提交回复
热议问题