bar
and hist
Given some data in a pandas.Series , rv
, there is a difference between
Obtaining a bar
plot that looks like the hist
plot requires some manipulating of default behavior for bar
.
bar
to use actual x data for plotting range by passing both x (hist.index
) and y (hist.values
). The default bar behavior is to plot the y data against an arbitrary range and put the x data as the label.width
parameter to be related to actual step size of x data (The default is 0.8
)align
parameter to 'center'
.These changes need to be made via matplotlib's bar() called on the axis (ax
) instead of pandas's bar() called on the data (hist
).
%matplotlib inline
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12.0, 8.0)
matplotlib.style.use('ggplot')
# Setup size and distribution
size = 50000
distribution = stats.norm()
# Create random data
rv = pd.Series(distribution.rvs(size=size))
# Get sane start and end points of distribution
start = distribution.ppf(0.01)
end = distribution.ppf(0.99)
# Build PDF and turn into pandas Series
x = np.linspace(start, end, size)
y = distribution.pdf(x)
pdf = pd.Series(y, x)
# Get histogram of random data
y, x = np.histogram(rv, bins=50, normed=True)
# Correct bin edge placement
x = [(a+x[i+1])/2.0 for i,a in enumerate(x[0:-1])]
hist = pd.Series(y, x)
# Plot previously histogrammed data
ax = pdf.plot(lw=2, label='PDF', legend=True)
w = abs(hist.index[1]) - abs(hist.index[0])
ax.bar(hist.index, hist.values, width=w, alpha=0.5, align='center')
ax.legend(['PDF', 'Random Samples'])
Another, simpler solution is to create fake samples that reproduce the same histogram and then simply use hist().
I.e., after retrieving bins
and counts
from stored data, do
fake = np.array([])
for i in range(len(counts)):
a, b = bins[i], bins[i+1]
sample = a + (b-a)*np.random.rand(counts[i])
fake = np.append(fake, sample)
plt.hist(fake, bins=bins)