Is there a numpy builtin to reject outliers from a list

后端 未结 10 766
孤城傲影
孤城傲影 2020-11-28 18:00

Is there a numpy builtin to do something like the following? That is, take a list d and return a list filtered_d with any outlying elements removed

10条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-28 19:01

    Consider that all the above methods fail when your standard deviation gets very large due to huge outliers.

    (Simalar as the average caluclation fails and should rather caluclate the median. Though, the average is "more prone to such an error as the stdDv".)

    You could try to iteratively apply your algorithm or you filter using the interquartile range: (here "factor" relates to a n*sigma range, yet only when your data follows a Gaussian distribution)

    import numpy as np
    
    def sortoutOutliers(dataIn,factor):
        quant3, quant1 = np.percentile(dataIn, [75 ,25])
        iqr = quant3 - quant1
        iqrSigma = iqr/1.34896
        medData = np.median(dataIn)
        dataOut = [ x for x in dataIn if ( (x > medData - factor* iqrSigma) and (x < medData + factor* iqrSigma) ) ] 
        return(dataOut)
    

提交回复
热议问题