Quantile-Quantile Plot using SciPy

后端 未结 9 1959
长情又很酷
长情又很酷 2020-12-04 06:44

How would you create a qq-plot using Python?

Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. T

相关标签:
9条回答
  • 2020-12-04 06:57

    Using qqplot of statsmodels.api is another option:

    Very basic example:

    import numpy as np
    import statsmodels.api as sm
    import pylab
    
    test = np.random.normal(0,1, 1000)
    
    sm.qqplot(test, line='45')
    pylab.show()
    

    Result:

    enter image description here

    Documentation and more example are here

    0 讨论(0)
  • 2020-12-04 06:58

    How big is your sample? Here is another option to test your data against any distribution using OpenTURNS library. In the example below, I generate a sample x of 1.000.000 numbers from a Uniform distribution and test it against a Normal distribution. You can replace x by your data if you reshape it as x= [[x1], [x2], .., [xn]]

    import openturns as ot
    
    x = ot.Uniform().getSample(1000000)
    g = ot.VisualTest.DrawQQplot(x, ot.Normal())
    g
    

    In my Jupyter Notebook, I see:

    If you are writing a script, you can do it more properly

    from openturns.viewer import View`
    import matplotlib.pyplot as plt
    View(g)
    plt.show()
    
    0 讨论(0)
  • 2020-12-04 07:03

    I came up with this. Maybe you can improve it. Especially the method of generating the quantiles of the distribution seems cumbersome to me.

    You could replace np.random.normal with any other distribution from np.random to compare data against other distributions.

    #!/bin/python
    
    import numpy as np
    
    measurements = np.random.normal(loc = 20, scale = 5, size=100000)
    
    def qq_plot(data, sample_size):
        qq = np.ones([sample_size, 2])
        np.random.shuffle(data)
        qq[:, 0] = np.sort(data[0:sample_size])
        qq[:, 1] = np.sort(np.random.normal(size = sample_size))
        return qq
    
    print qq_plot(measurements, 1000)
    
    0 讨论(0)
  • 2020-12-04 07:04

    If you need to do a QQ plot of one sample vs. another, statsmodels includes qqplot_2samples(). Like Ricky Robinson in a comment above, this is what I think of as a QQ plot vs a probability plot which is a sample against a theoretical distribution.

    http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot_2samples.html

    0 讨论(0)
  • 2020-12-04 07:04

    To add to the confusion around Q-Q plots and probability plots in the Python and R worlds, this is what the SciPy manual says:

    "probplot generates a probability plot, which should not be confused with a Q-Q or a P-P plot. Statsmodels has more extensive functionality of this type, see statsmodels.api.ProbPlot."

    If you try out scipy.stats.probplot, you'll see that indeed it compares a dataset to a theoretical distribution. Q-Q plots, OTOH, compare two datasets (samples).

    R has functions qqnorm, qqplot and qqline. From the R help (Version 3.6.3):

    qqnorm is a generic function the default method of which produces a normal QQ plot of the values in y. qqline adds a line to a “theoretical”, by default normal, quantile-quantile plot which passes through the probs quantiles, by default the first and third quartiles.

    qqplot produces a QQ plot of two datasets.

    In short, R's qqnorm offers the same functionality that scipy.stats.probplot provides with the default setting dist=norm. But the fact that they called it qqnorm and that it's supposed to "produce a normal QQ plot" may easily confuse users.

    Finally, a word of warning. These plots don't replace proper statistical testing and should be used for illustrative purposes only.

    0 讨论(0)
  • 2020-12-04 07:09

    You can use bokeh

    from bokeh.plotting import figure, show
    from scipy.stats import probplot
    # pd_series is the series you want to plot
    series1 = probplot(pd_series, dist="norm")
    p1 = figure(title="Normal QQ-Plot", background_fill_color="#E8DDCB")
    p1.scatter(series1[0][0],series1[0][1], fill_color="red")
    show(p1)
    
    0 讨论(0)
提交回复
热议问题