Frequency tables in pandas (like plyr in R)

后端 未结 4 2228
不思量自难忘°
不思量自难忘° 2020-12-28 18:53

My problem is how to calculate frequencies on multiple variables in pandas . I have from this dataframe :

d1 = pd.DataFrame( {\'StudentID\': [\"x1\", \"x10         


        
4条回答
  •  攒了一身酷
    2020-12-28 19:27

    There is another approach that I like to use for similar problems, it uses groupby and unstack:

    d1 = pd.DataFrame({'StudentID': ["x1", "x10", "x2","x3", "x4", "x5", "x6",   "x7",     "x8", "x9"],
                       'StudentGender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
                       'ExamenYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
                       'Exam': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
                       'Participated': ['no','yes','yes','yes','no','yes','yes','yes','yes','yes'],
                       'Passed': ['no','yes','yes','yes','no','yes','yes','yes','no','yes']},
                      columns = ['StudentID', 'StudentGender', 'ExamenYear', 'Exam', 'Participated', 'Passed'])
    

    (this is just the raw data from above)

    d2 = d1.groupby("ExamenYear").Participated.value_counts().unstack(fill_value=0)['yes']
    d3 = d1.groupby("ExamenYear").Passed.value_counts().unstack(fill_value=0)['yes']
    d2.name = "Participated"
    d3.name = "Passed"
    
    pd.DataFrame(data=[d2,d3]).T
                Participated  Passed
    ExamenYear                      
    2007                   2       2
    2008                   3       3
    2009                   3       2
    

    This solution is slightly more cumbersome than the one above using apply, but this one is easier to understand and extend, I feel.

提交回复
热议问题