Weighted percentile using numpy

前端未结

关注

 12  2219

Is there a way to use the numpy.percentile function to compute weighted percentile? Or is anyone aware of an alternative python function to compute weighted percentile?

相关标签:

12条回答

迷失自我

2020-12-01 04:18

A quick solution, by first sorting and then interpolating:

def weighted_percentile(data, percents, weights=None):
    ''' percents in units of 1%
        weights specifies the frequency (count) of data.
    '''
    if weights is None:
        return np.percentile(data, percents)
    ind=np.argsort(data)
    d=data[ind]
    w=weights[ind]
    p=1.*w.cumsum()/w.sum()*100
    y=np.interp(percents, p, d)
    return y

0 讨论(0)

渐次进展

2020-12-01 04:21
I don' know what's Weighted percentile means, but from @Joan Smith's answer, It seems that you just need to repeat every element in ar, you can use numpy.repeat():
```
import numpy as np
np.repeat([1,2,3], [4,5,6])
```
the result is:
```
array([1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-12-01 04:22

As mentioned in comments, simply repeating values is impossible for float weights, and impractical for very large datasets. There is a library that does weighted percentiles here: http://kochanski.org/gpk/code/speechresearch/gmisclib/gmisclib.weighted_percentile-module.html It worked for me.

0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2020-12-01 04:25

here my solution:

def my_weighted_perc(data,perc,weights=None):
    if weights==None:
        return nanpercentile(data,perc)
    else:
        d=data[(~np.isnan(data))&(~np.isnan(weights))]
        ix=np.argsort(d)
        d=d[ix]
        wei=weights[ix]
        wei_cum=100.*cumsum(wei*1./sum(wei))
        return interp(perc,wei_cum,d)

it simply calculates the weighted CDF of the data and then it uses to estimate the weighted percentiles.

0 讨论(0)

迷失自我

2020-12-01 04:26

Unfortunately, numpy doesn't have built-in weighted functions for everything, but, you can always put something together.

def weight_array(ar, weights):
     zipped = zip(ar, weights)
     weighted = []
     for a, w in zipped:
         for j in range(w):
             weighted.append(a)
     return weighted


np.percentile(weight_array(ar, weights), 25)

0 讨论(0)

小鲜肉

2020-12-01 04:26

I use this function for my needs:

def quantile_at_values(values, population, weights=None):
    values = numpy.atleast_1d(values).astype(float)
    population = numpy.atleast_1d(population).astype(float)
    # if no weights are given, use equal weights
    if weights is None:
        weights = numpy.ones(population.shape).astype(float)
        normal = float(len(weights))
    # else, check weights                  
    else:                                           
        weights = numpy.atleast_1d(weights).astype(float)
        assert len(weights) == len(population)
        assert (weights >= 0).all()
        normal = numpy.sum(weights)                    
        assert normal > 0.
    quantiles = numpy.array([numpy.sum(weights[population <= value]) for value in values]) / normal
    assert (quantiles >= 0).all() and (quantiles <= 1).all()
    return quantiles

It is vectorized as far as I could go.
It has a lot of sanity checks.
It works with floats as weights.
It can work without weights (→ equal weights).
It can compute multiple quantiles at once.

Multiply results by 100 if you want percentiles instead of quantiles.

0 讨论(0)

上一页 1 2