Calculate Arbitrary Percentile on Pandas GroupBy

后端 未结 3 1480
-上瘾入骨i
-上瘾入骨i 2020-12-14 02:09

Currently there is a median method on the Pandas\'s GroupBy objects.

Is there is a way to calculate an arbitrary percentile (s

相关标签:
3条回答
  • 2020-12-14 02:11

    You want the quantile method:

    In [47]: df
    Out[47]: 
               A         B    C
    0   0.719391  0.091693  one
    1   0.951499  0.837160  one
    2   0.975212  0.224855  one
    3   0.807620  0.031284  one
    4   0.633190  0.342889  one
    5   0.075102  0.899291  one
    6   0.502843  0.773424  one
    7   0.032285  0.242476  one
    8   0.794938  0.607745  one
    9   0.620387  0.574222  one
    10  0.446639  0.549749  two
    11  0.664324  0.134041  two
    12  0.622217  0.505057  two
    13  0.670338  0.990870  two
    14  0.281431  0.016245  two
    15  0.675756  0.185967  two
    16  0.145147  0.045686  two
    17  0.404413  0.191482  two
    18  0.949130  0.943509  two
    19  0.164642  0.157013  two
    
    In [48]: df.groupby('C').quantile(.95)
    Out[48]: 
                A         B
    C                      
    one  0.964541  0.871332
    two  0.826112  0.969558
    
    0 讨论(0)
  • 2020-12-14 02:11

    With pandas >= 0.25.0 you can also use Named aggregation

    An example would be

    import numpy
    import pandas as pd
    df = pd.DataFrame({'A': numpy.random.randint(1,3,size=100),'C': numpy.random.randn(100)})
    df.groupby('A').agg(min_val = ('C','min'), percentile_80 = ('C',lambda x: x.quantile(0.8)))
    
    0 讨论(0)
  • 2020-12-14 02:24

    I found another useful solution here

    If I have to use groupby another approach can be:

    def percentile(n):
        def percentile_(x):
            return np.percentile(x, n)
        percentile_.__name__ = 'percentile_%s' % n
        return percentile_
    

    Using the below call, I am able to achieve the same result as the solution given by @TomAugspurger

    df.groupby('C').agg([percentile(50), percentile(95)])

    0 讨论(0)
提交回复
热议问题