I am trying to calculate the trimmed mean, which excludes the outliers, of an array.
I found there is a module called scipy.stats.tmean
, but it requires the user specifies the range by absolute value instead of percentage values.
In Matlab, we have m = trimmean(X,percent)
, that does exactly what I want.
Do we have the counterpart in Python?
At least for scipy v0.14.0, there is a dedicated (but undocumented?) function for this:
from scipy import stats m = stats.trim_mean(X, 0.1) # Trim 10% at both ends
which used stats.trimboth
inside.
Edit:
The method I described previously (at the bottom of this answer) will have problem with this input:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
Since it will not include all the 1
's and 6
's due to it having the same value as the limit.
Actually you can just implement the whole thing yourself, following the instruction in the MatLab documentation. It's apparently simpler =D
Here's the code in Python 2:
from numpy import mean def trimmean(arr, percent): n = len(arr) k = int(round(n*(float(percent)/100)/2)) return mean(arr[k+1:n-k])
You can use numpy.percentile
or scipy.stats.scoreatpercentile
to get the absolute value.
from scipy.stats import tmean, scoreatpercentile def trimmean(arr, percent): lower_limit = scoreatpercentile(arr, percent/2) upper_limit = scoreatpercentile(arr, 100-percent/2) return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(False, False))
You should try with various inputs to check on the boundary cases, to get exactly the behaviour that you want.