Frequency tables in pandas (like plyr in R)

后端 未结 4 2229
不思量自难忘°
不思量自难忘° 2020-12-28 18:53

My problem is how to calculate frequencies on multiple variables in pandas . I have from this dataframe :

d1 = pd.DataFrame( {\'StudentID\': [\"x1\", \"x10         


        
4条回答
  •  感情败类
    2020-12-28 19:41

    I finally decided to use apply.

    I am posting what I came up with hoping that it can be useful for others.

    From what I understand from Wes' book "Python for Data analysis"

    • apply is more flexible than agg and transform because you can define your own function.
    • the only requirement is that the functions returns a pandas object or a scalar value.
    • the inner mechanics: the function is called on each piece of the grouped object abd results are glued together using pandas.concat
    • One needs to "hard-code" structure you want at the end

    Here is what I came up with

    def ZahlOccurence_0(x):
          return pd.Series({'All': len(x['StudentID']),
                           'Part': sum(x['Participated'] == 'yes'),
                           'Pass' :  sum(x['Passed'] == 'yes')})
    

    when I run it :

     d1.groupby('ExamenYear').apply(ZahlOccurence_0)
    

    I get the correct results

                All  Part  Pass
    ExamenYear                 
    2007          3     2     2
    2008          4     3     3
    2009          3     3     2
    

    This approach would also allow me to combine frequencies with other stats

    import numpy as np
    d1['testValue'] = np.random.randn(len(d1))
    
    def ZahlOccurence_1(x):
        return pd.Series({'All': len(x['StudentID']),
            'Part': sum(x['Participated'] == 'yes'),
            'Pass' :  sum(x['Passed'] == 'yes'),
            'test' : x['testValue'].mean()})
    
    
    d1.groupby('ExamenYear').apply(ZahlOccurence_1)
    
    
                All  Part  Pass      test
    ExamenYear                           
    2007          3     2     2  0.358702
    2008          4     3     3  1.004504
    2009          3     3     2  0.521511
    

    I hope someone else will find this useful

提交回复
热议问题