Can scipy.stats identify and mask obvious outliers?

后端 未结 4 665
终归单人心
终归单人心 2020-12-07 19:44

With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y sc

4条回答
  •  一个人的身影
    2020-12-07 20:10

    The statsmodels package has what you need. Look at this little code snippet and its output:

    # Imports #
    import statsmodels.api as smapi
    import statsmodels.graphics as smgraphics
    # Make data #
    x = range(30)
    y = [y*10 for y in x]
    # Add outlier #
    x.insert(6,15)
    y.insert(6,220)
    # Make graph #
    regression = smapi.OLS(x, y).fit()
    figure = smgraphics.regressionplots.plot_fit(regression, 0)
    # Find outliers #
    test = regression.outlier_test()
    outliers = ((x[i],y[i]) for i,t in enumerate(test) if t[2] < 0.5)
    print 'Outliers: ', list(outliers)
    

    Example figure 1

    Outliers: [(15, 220)]

    Edit

    With the newer version of statsmodels, things have changed a bit. Here is a new code snippet that shows the same type of outlier detection.

    # Imports #
    from random import random
    import statsmodels.api as smapi
    from statsmodels.formula.api import ols
    import statsmodels.graphics as smgraphics
    # Make data #
    x = range(30)
    y = [y*(10+random())+200 for y in x]
    # Add outlier #
    x.insert(6,15)
    y.insert(6,220)
    # Make fit #
    regression = ols("data ~ x", data=dict(data=y, x=x)).fit()
    # Find outliers #
    test = regression.outlier_test()
    outliers = ((x[i],y[i]) for i,t in enumerate(test.icol(2)) if t < 0.5)
    print 'Outliers: ', list(outliers)
    # Figure #
    figure = smgraphics.regressionplots.plot_fit(regression, 1)
    # Add line #
    smgraphics.regressionplots.abline_plot(model_results=regression, ax=figure.axes[0])
    

    Example figure 2

    Outliers: [(15, 220)]

提交回复
热议问题