df2 = pd.DataFrame({\'X\' : [\'X1\', \'X1\', \'X1\', \'X1\'], \'Y\' : [\'Y2\',\'Y1\',\'Y1\',\'Y1\'], \'Z\' : [\'Z3\',\'Z1\',\'Z1\',\'Z2\']})
X Y Z
0 X1 Y2
For best performance I recommend doing DataFrame.drop_duplicates
followed up aggfunc='count'
.
Others are correct that aggfunc=pd.Series.nunique
will work. This can be slow, however, if the number of index
groups you have is large (>1000).
So instead of (to quote @Javier)
df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)
I suggest
df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')
This works because it guarantees that every subgroup (each combination of ('Y', 'Z')
) will have unique (non-duplicate) values of 'X'
.