pandas describe by - additional parameters

后端 未结 3 819
既然无缘
既然无缘 2021-02-06 11:20

I see that the pandas library has a Describe by function which returns some useful statistics. However, is there a way to add additional rows to the output such as

3条回答
  •  萌比男神i
    2021-02-06 11:50

    the default describe looks like this:

    np.random.seed([3,1415])
    df = pd.DataFrame(np.random.rand(100, 5), columns=list('ABCDE'))
    
    df.describe()
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    

    Updated for pandas 0.20
    I'd make my own describe like below. It should be obvious how to add more.

    def describe(df, stats):
        d = df.describe()
        return d.append(df.reindex_axis(d.columns, 1).agg(stats))
    
    describe(df, ['skew', 'mad', 'kurt'])
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
    mad      0.267730    0.249968    0.254351    0.228558    0.242874
    kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642
    

    Old Answer

    def describe(df):
        return pd.concat([df.describe().T,
                          df.mad().rename('mad'),
                          df.skew().rename('skew'),
                          df.kurt().rename('kurt'),
                         ], axis=1).T
    
    describe(df)
    
                    A           B           C           D           E
    count  100.000000  100.000000  100.000000  100.000000  100.000000
    mean     0.495871    0.472939    0.455570    0.503899    0.451341
    std      0.303589    0.291968    0.294984    0.269936    0.284666
    min      0.006453    0.001559    0.001068    0.015311    0.009526
    25%      0.239379    0.219141    0.196251    0.294371    0.202956
    50%      0.529596    0.456548    0.376558    0.532002    0.432936
    75%      0.759452    0.739666    0.665563    0.730702    0.686793
    max      0.999799    0.994510    0.997271    0.981551    0.979221
    mad      0.267730    0.249968    0.254351    0.228558    0.242874
    skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
    kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642
    

提交回复
热议问题