Multiple histograms in Pandas

青春壹個敷衍的年華 提交于 2019-11-27 15:50:46

问题


I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.

I have the following code:

import nsfg
import matplotlib.pyplot as plt
df = nsfg.ReadFemPreg()
preg = nsfg.ReadFemPreg()
live = preg[preg.outcome == 1]

first = live[live.birthord == 1]
others = live[live.birthord != 1]

#fig = plt.figure()
#ax1 = fig.add_subplot(111)

first.hist(column = 'prglngth', bins = 40, color = 'teal', \
           alpha = 0.5)
others.hist(column = 'prglngth', bins = 40, color = 'blue', \
            alpha = 0.5)
plt.show()

The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as hists nor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?

Here's an example of how I'd like the final figure to look:


回答1:


As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
#import seaborn
#seaborn.set(style='ticks')

np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
fig, ax = plt.subplots()

a_heights, a_bins = np.histogram(df['A'])
b_heights, b_bins = np.histogram(df['B'], bins=a_bins)

width = (a_bins[1] - a_bins[0])/3

ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue')
ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen')
#seaborn.despine(ax=ax, offset=10)

And that gives me:




回答2:


From the pandas website (http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
                    'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])

plt.figure();

df4.plot(kind='hist', alpha=0.5)



回答3:


In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist() consecutively on the series you want to plot:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas


np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])

df['A'].hist()
df['B'].hist()

This gives you:

Note that the order you call .hist() matters (the first one will be at the back)




回答4:


Here is the snippet, In my case I have explicitly specified bins and range as I didn't handle outlier removal as the author of the book.

fig, ax = plt.subplots()
ax.hist([first.prglngth, others.prglngth], 10, (27, 50), histtype="bar", label=("First", "Other"))
ax.set_title("Histogram")
ax.legend()

Refer Matplotlib multihist plot with different sizes example.




回答5:


You make two dataframes and one matplotlib axis

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'data1': np.random.randn(10),
    'data2': np.random.randn(10)
})

df2 = df1.copy()

fig, ax = plt.subplots()
df1.hist(column=['data1'], ax=ax)
df2.hist(column=['data2'], ax=ax)


来源:https://stackoverflow.com/questions/25539195/multiple-histograms-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!