Trimmed Mean with Percentage Limit in Python?

前端 未结 3 745
感动是毒
感动是毒 2020-12-17 10:39

I am trying to calculate the trimmed mean, which excludes the outliers, of an array.

I found there is a module called scipy.stats.tmean, but it req

相关标签:
3条回答
  • 2020-12-17 10:54

    Here's a manual implementation using floor from the math library...

    def trimMean(tlist,tperc):
        removeN = int(math.floor(len(tlist) * tperc / 2))
        tlist.sort()
        if removeN > 0: tlist = tlist[removeN:-removeN]
        return reduce(lambda a,b : a+b, tlist) / float(len(tlist))
    
    0 讨论(0)
  • 2020-12-17 11:11

    Edit:

    The method I described previously (at the bottom of this answer) will have problem with this input:

    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6
    

    Since it will not include all the 1's and 6's due to it having the same value as the limit.

    Actually you can just implement the whole thing yourself, following the instruction in the MatLab documentation. It's apparently simpler =D

    Here's the code in Python 2:

    from numpy import mean
    def trimmean(arr, percent):
        n = len(arr)
        k = int(round(n*(float(percent)/100)/2))
        return mean(arr[k+1:n-k])
    

    You can use numpy.percentile or scipy.stats.scoreatpercentile to get the absolute value.

    from scipy.stats import tmean, scoreatpercentile
    def trimmean(arr, percent):
        lower_limit = scoreatpercentile(arr, percent/2)
        upper_limit = scoreatpercentile(arr, 100-percent/2)
        return tmean(arr, limits=(lower_limit, upper_limit), inclusive=(False, False))
    

    You should try with various inputs to check on the boundary cases, to get exactly the behaviour that you want.

    0 讨论(0)
  • 2020-12-17 11:15

    At least for scipy v0.14.0, there is a dedicated function for this (scipy.stats.trim_mean):

    from scipy import stats
    m = stats.trim_mean(X, 0.1) # Trim 10% at both ends
    

    which used stats.trimboth inside.

    From the source code it is possible to see that with proportiontocut=0.1 the mean will be calculated using 80% of the data. Note that the scipy.stats.trim_mean can not handle np.nan.

    0 讨论(0)
提交回复
热议问题