pandas: find percentile stats of a given column

后端 未结 4 1485
不思量自难忘°
不思量自难忘° 2020-12-04 19:16

I have a pandas data frame my_df, where I can find the mean(), median(), mode() of a given column:

my_df[\'field_A\'].mean()
my_df[\'field_A\'].median()
my_d         


        
相关标签:
4条回答
  • 2020-12-04 19:47

    I figured out below would work:

    my_df.dropna().quantile([0.0, .9])
    
    0 讨论(0)
  • 2020-12-04 19:49

    You can use the pandas.DataFrame.quantile() function, as shown below.

    import pandas as pd
    import random
    
    A = [ random.randint(0,100) for i in range(10) ]
    B = [ random.randint(0,100) for i in range(10) ]
    
    df = pd.DataFrame({ 'field_A': A, 'field_B': B })
    df
    #    field_A  field_B
    # 0       90       72
    # 1       63       84
    # 2       11       74
    # 3       61       66
    # 4       78       80
    # 5       67       75
    # 6       89       47
    # 7       12       22
    # 8       43        5
    # 9       30       64
    
    df.field_A.mean()   # Same as df['field_A'].mean()
    # 54.399999999999999
    
    df.field_A.median() 
    # 62.0
    
    # You can call `quantile(i)` to get the i'th quantile,
    # where `i` should be a fractional number.
    
    df.field_A.quantile(0.1) # 10th percentile
    # 11.9
    
    df.field_A.quantile(0.5) # same as median
    # 62.0
    
    df.field_A.quantile(0.9) # 90th percentile
    # 89.10000000000001
    
    0 讨论(0)
  • 2020-12-04 19:55

    You can even give multiple columns with null values and get multiple quantile values (I use 95 percentile for outlier treatment)

    my_df[['field_A','field_B']].dropna().quantile([0.0, .5, .90, .95])
    
    0 讨论(0)
  • 2020-12-04 20:02

    assume series s

    s = pd.Series(np.arange(100))
    

    Get quantiles for [.1, .2, .3, .4, .5, .6, .7, .8, .9]

    s.quantile(np.linspace(.1, 1, 9, 0))
    
    0.1     9.9
    0.2    19.8
    0.3    29.7
    0.4    39.6
    0.5    49.5
    0.6    59.4
    0.7    69.3
    0.8    79.2
    0.9    89.1
    dtype: float64
    

    OR

    s.quantile(np.linspace(.1, 1, 9, 0), 'lower')
    
    0.1     9
    0.2    19
    0.3    29
    0.4    39
    0.5    49
    0.6    59
    0.7    69
    0.8    79
    0.9    89
    dtype: int32
    
    0 讨论(0)
提交回复
热议问题