Trimmed Mean with Percentage Limit in Python?

前端未结

关注

 3  745

感动是毒

I am trying to calculate the trimmed mean, which excludes the outliers, of an array.

I found there is a module called scipy.stats.tmean, but it req

相关标签:

3条回答

夕颜

2020-12-17 10:54

Here's a manual implementation using floor from the math library...

def trimMean(tlist,tperc):
    removeN = int(math.floor(len(tlist) * tperc / 2))
    tlist.sort()
    if removeN > 0: tlist = tlist[removeN:-removeN]
    return reduce(lambda a,b : a+b, tlist) / float(len(tlist))

0 讨论(0)

长发绾君心

2020-12-17 11:11
Edit:

The method I described previously (at the bottom of this answer) will have problem with this input:
```
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
```
Since it will not include all the 1's and 6's due to it having the same value as the limit.

Actually you can just implement the whole thing yourself, following the instruction in the MatLab documentation. It's apparently simpler =D

Here's the code in Python 2:
```
from numpy import mean
def trimmean(arr, percent):
    n = len(arr)
    k = int(round(n*(float(percent)/100)/2))
    return mean(arr[k+1:n-k])
```
You can use numpy.percentile or scipy.stats.scoreatpercentile to get the absolute value.
```
from scipy.stats import tmean, scoreatpercentile
def trimmean(arr, percent):
    lower_limit = scoreatpercentile(arr, percent/2)
    upper_limit = scoreatpercentile(arr, 100-percent/2)
    return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(False, False))
```
You should try with various inputs to check on the boundary cases, to get exactly the behaviour that you want.
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-12-17 11:15
At least for scipy v0.14.0, there is a dedicated function for this (scipy.stats.trim_mean):
```
from scipy import stats
m = stats.trim_mean(X, 0.1) # Trim 10% at both ends
```
which used stats.trimboth inside.

From the source code it is possible to see that with proportiontocut=0.1 the mean will be calculated using 80% of the data. Note that the scipy.stats.trim_mean can not handle np.nan.
0 讨论(0)
发布评论:

提交评论
- 加载中...