Is there a numpy builtin to reject outliers from a list

后端未结

关注

 10  766

孤城傲影 2020-11-28 18:00

Is there a numpy builtin to do something like the following? That is, take a list d and return a list filtered_d with any outlying elements removed

10条回答

轻奢々 (楼主)

2020-11-28 19:01
Consider that all the above methods fail when your standard deviation gets very large due to huge outliers.

(Simalar as the average caluclation fails and should rather caluclate the median. Though, the average is "more prone to such an error as the stdDv".)

You could try to iteratively apply your algorithm or you filter using the interquartile range: (here "factor" relates to a n*sigma range, yet only when your data follows a Gaussian distribution)
```
import numpy as np

def sortoutOutliers(dataIn,factor):
    quant3, quant1 = np.percentile(dataIn, [75 ,25])
    iqr = quant3 - quant1
    iqrSigma = iqr/1.34896
    medData = np.median(dataIn)
    dataOut = [ x for x in dataIn if ( (x > medData - factor* iqrSigma) and (x < medData + factor* iqrSigma) ) ] 
    return(dataOut)
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...