I am trying to calculate the trimmed mean, which excludes the outliers, of an array.
I found there is a module called scipy.stats.tmean, but it req
Here's a manual implementation using floor from the math library...
def trimMean(tlist,tperc):
removeN = int(math.floor(len(tlist) * tperc / 2))
tlist.sort()
if removeN > 0: tlist = tlist[removeN:-removeN]
return reduce(lambda a,b : a+b, tlist) / float(len(tlist))
Edit:
The method I described previously (at the bottom of this answer) will have problem with this input:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
Since it will not include all the 1
's and 6
's due to it having the same value as the limit.
Actually you can just implement the whole thing yourself, following the instruction in the MatLab documentation. It's apparently simpler =D
Here's the code in Python 2:
from numpy import mean
def trimmean(arr, percent):
n = len(arr)
k = int(round(n*(float(percent)/100)/2))
return mean(arr[k+1:n-k])
You can use numpy.percentile or scipy.stats.scoreatpercentile to get the absolute value.
from scipy.stats import tmean, scoreatpercentile
def trimmean(arr, percent):
lower_limit = scoreatpercentile(arr, percent/2)
upper_limit = scoreatpercentile(arr, 100-percent/2)
return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(False, False))
You should try with various inputs to check on the boundary cases, to get exactly the behaviour that you want.
At least for scipy v0.14.0, there is a dedicated function for this (scipy.stats.trim_mean):
from scipy import stats
m = stats.trim_mean(X, 0.1) # Trim 10% at both ends
which used stats.trimboth inside.
From the source code it is possible to see that with proportiontocut=0.1
the mean will be calculated using 80% of the data. Note that the scipy.stats.trim_mean
can not handle np.nan
.