df2 = pd.DataFrame({\'X\' : [\'X1\', \'X1\', \'X1\', \'X1\'], \'Y\' : [\'Y2\',\'Y1\',\'Y1\',\'Y1\'], \'Z\' : [\'Z3\',\'Z1\',\'Z1\',\'Z2\']})
X Y Z
0 X1 Y2
Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:
In [1]:
import pandas as pd
# Set exemple
df2 = pd.DataFrame({'X' : ['X1', 'X1', 'X1', 'X1'], 'Y' : ['Y2','Y1','Y1','Y1'], 'Z' : ['Z3','Z1','Z1','Z2']})
# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)
Out [1]:
Z Z1 Z2 Z3
Y
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
aggfunc=pd.Series.nunique
will only count unique values for a series - in this case count the unique values for a column. But this doesn't quite reflect as an alternative to aggfunc='count'
For simple counting, it better to use aggfunc=pd.Series.count
Do you mean something like this?
In [39]: df2.pivot_table(values='X', rows='Y', cols='Z',
aggfunc=lambda x: len(x.unique()))
Out[39]:
Z Z1 Z2 Z3
Y
Y1 1 1 NaN
Y2 NaN NaN 1
Note that using len
assumes you don't have NA
s in your DataFrame. You can do x.value_counts().count()
or len(x.dropna().unique())
otherwise.
This is a good way of counting entries within .pivot_table
:
df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')
X1 X2
Y Z
Y1 Z1 1 1
Z2 1 NaN
Y2 Z3 1 NaN
aggfunc=pd.Series.nunique
provides distinct count.
Full Code:
df2.pivot_table(values='X', rows='Y', cols='Z',
aggfunc=pd.Series.nunique)
Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.
For best performance I recommend doing DataFrame.drop_duplicates
followed up aggfunc='count'
.
Others are correct that aggfunc=pd.Series.nunique
will work. This can be slow, however, if the number of index
groups you have is large (>1000).
So instead of (to quote @Javier)
df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)
I suggest
df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')
This works because it guarantees that every subgroup (each combination of ('Y', 'Z')
) will have unique (non-duplicate) values of 'X'
.