Python Pandas : pivot table with aggfunc = count unique distinct

后端 未结 8 1747
谎友^
谎友^ 2020-12-07 13:02
df2 = pd.DataFrame({\'X\' : [\'X1\', \'X1\', \'X1\', \'X1\'], \'Y\' : [\'Y2\',\'Y1\',\'Y1\',\'Y1\'], \'Z\' : [\'Z3\',\'Z1\',\'Z1\',\'Z2\']})

    X   Y   Z
0  X1  Y2         


        
8条回答
  •  北海茫月
    2020-12-07 13:44

    For best performance I recommend doing DataFrame.drop_duplicates followed up aggfunc='count'.

    Others are correct that aggfunc=pd.Series.nunique will work. This can be slow, however, if the number of index groups you have is large (>1000).

    So instead of (to quote @Javier)

    df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)
    

    I suggest

    df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')
    

    This works because it guarantees that every subgroup (each combination of ('Y', 'Z')) will have unique (non-duplicate) values of 'X'.

提交回复
热议问题