Multiple histograms in Pandas

后端 未结 5 1421
攒了一身酷
攒了一身酷 2020-12-05 06:59

I would like to create the following histogram (see image below) taken from the book \"Think Stats\". However, I cannot get them on the same plot. Each DataFrame takes its o

相关标签:
5条回答
  • 2020-12-05 07:43

    You make two dataframes and one matplotlib axis

    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np
    
    df1 = pd.DataFrame({
        'data1': np.random.randn(10),
        'data2': np.random.randn(10)
    })
    
    df2 = df1.copy()
    
    fig, ax = plt.subplots()
    df1.hist(column=['data1'], ax=ax)
    df2.hist(column=['data2'], ax=ax)
    
    0 讨论(0)
  • 2020-12-05 07:51

    From the pandas website (http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):

    df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
                        'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
    
    plt.figure();
    
    df4.plot(kind='hist', alpha=0.5)
    
    0 讨论(0)
  • 2020-12-05 07:54

    As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:

    %matplotlib inline
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas
    #import seaborn
    #seaborn.set(style='ticks')
    
    np.random.seed(0)
    df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
    fig, ax = plt.subplots()
    
    a_heights, a_bins = np.histogram(df['A'])
    b_heights, b_bins = np.histogram(df['B'], bins=a_bins)
    
    width = (a_bins[1] - a_bins[0])/3
    
    ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue')
    ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen')
    #seaborn.despine(ax=ax, offset=10)
    

    And that gives me: enter image description here

    0 讨论(0)
  • 2020-12-05 07:55

    In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist() consecutively on the series you want to plot:

    %matplotlib inline
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas
    
    
    np.random.seed(0)
    df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
    
    df['A'].hist()
    df['B'].hist()
    

    This gives you:

    Note that the order you call .hist() matters (the first one will be at the back)

    0 讨论(0)
  • 2020-12-05 07:59

    Here is the snippet, In my case I have explicitly specified bins and range as I didn't handle outlier removal as the author of the book.

    fig, ax = plt.subplots()
    ax.hist([first.prglngth, others.prglngth], 10, (27, 50), histtype="bar", label=("First", "Other"))
    ax.set_title("Histogram")
    ax.legend()
    

    Refer Matplotlib multihist plot with different sizes example.

    0 讨论(0)
提交回复
热议问题