Weighted percentile using numpy

前端 未结 12 2209
一个人的身影
一个人的身影 2020-12-01 03:28

Is there a way to use the numpy.percentile function to compute weighted percentile? Or is anyone aware of an alternative python function to compute weighted percentile?

相关标签:
12条回答
  • 2020-12-01 04:18

    A quick solution, by first sorting and then interpolating:

    def weighted_percentile(data, percents, weights=None):
        ''' percents in units of 1%
            weights specifies the frequency (count) of data.
        '''
        if weights is None:
            return np.percentile(data, percents)
        ind=np.argsort(data)
        d=data[ind]
        w=weights[ind]
        p=1.*w.cumsum()/w.sum()*100
        y=np.interp(percents, p, d)
        return y
    
    0 讨论(0)
  • 2020-12-01 04:21

    I don' know what's Weighted percentile means, but from @Joan Smith's answer, It seems that you just need to repeat every element in ar, you can use numpy.repeat():

    import numpy as np
    np.repeat([1,2,3], [4,5,6])
    

    the result is:

    array([1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3])
    
    0 讨论(0)
  • 2020-12-01 04:22

    As mentioned in comments, simply repeating values is impossible for float weights, and impractical for very large datasets. There is a library that does weighted percentiles here: http://kochanski.org/gpk/code/speechresearch/gmisclib/gmisclib.weighted_percentile-module.html It worked for me.

    0 讨论(0)
  • 2020-12-01 04:25

    here my solution:

    def my_weighted_perc(data,perc,weights=None):
        if weights==None:
            return nanpercentile(data,perc)
        else:
            d=data[(~np.isnan(data))&(~np.isnan(weights))]
            ix=np.argsort(d)
            d=d[ix]
            wei=weights[ix]
            wei_cum=100.*cumsum(wei*1./sum(wei))
            return interp(perc,wei_cum,d)
    

    it simply calculates the weighted CDF of the data and then it uses to estimate the weighted percentiles.

    0 讨论(0)
  • 2020-12-01 04:26

    Unfortunately, numpy doesn't have built-in weighted functions for everything, but, you can always put something together.

    def weight_array(ar, weights):
         zipped = zip(ar, weights)
         weighted = []
         for a, w in zipped:
             for j in range(w):
                 weighted.append(a)
         return weighted
    
    
    np.percentile(weight_array(ar, weights), 25)
    
    0 讨论(0)
  • 2020-12-01 04:26

    I use this function for my needs:

    def quantile_at_values(values, population, weights=None):
        values = numpy.atleast_1d(values).astype(float)
        population = numpy.atleast_1d(population).astype(float)
        # if no weights are given, use equal weights
        if weights is None:
            weights = numpy.ones(population.shape).astype(float)
            normal = float(len(weights))
        # else, check weights                  
        else:                                           
            weights = numpy.atleast_1d(weights).astype(float)
            assert len(weights) == len(population)
            assert (weights >= 0).all()
            normal = numpy.sum(weights)                    
            assert normal > 0.
        quantiles = numpy.array([numpy.sum(weights[population <= value]) for value in values]) / normal
        assert (quantiles >= 0).all() and (quantiles <= 1).all()
        return quantiles
    
    • It is vectorized as far as I could go.
    • It has a lot of sanity checks.
    • It works with floats as weights.
    • It can work without weights (→ equal weights).
    • It can compute multiple quantiles at once.

    Multiply results by 100 if you want percentiles instead of quantiles.

    0 讨论(0)
提交回复
热议问题