I am still new to Python pandas\' pivot_table and would like to ask a way to count frequencies of values in one column, which is also linked to another column of ID. The Dat
You can use count
df.pivot_table(index='Account_number', columns='Product', aggfunc='count')
In new version of Pandas, slight modification is required. I had to spend some time figuring out so just wanted to add that here so that someone can directly use this.
df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
fill_value=0)
Solution: Use aggfunc='size'
Using aggfunc=len
or aggfunc='count'
like all the other answers on this page will not work for DataFrames with more than three columns. By default, pandas will apply this aggfunc
to all the columns not found in index
or columns
parameters.
For instance, if we had two more columns in our original DataFrame defined like this:
df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3],
'Product':['A', 'A', 'A', 'B', 'B','A', 'B'],
'Price': [10] * 7,
'Quantity': [100] * 7})
Output:
Account_number Product Price Quantity
0 1 A 10 100
1 1 A 10 100
2 2 A 10 100
3 2 B 10 100
4 2 B 10 100
5 3 A 10 100
6 3 B 10 100
If you apply the current solutions to this DataFrame, you would get the following:
df.pivot_table(index='Account_number',
columns='Product',
aggfunc=len,
fill_value=0)
Output:
Price Quantity
Product A B A B
Account_number
1 2 0 2 0
2 1 2 1 2
3 1 1 1 1
Instead, use aggfunc='size'
. Since size
always returns the same number for each column, pandas does not call it on every single column and just does it once.
df.pivot_table(index='Account_number',
columns='Product',
aggfunc='size',
fill_value=0)
Output:
Product A B
Account_number
1 2 0
2 1 2
3 1 1
You need to specify the aggfunc
as len
:
In [11]: df.pivot_table(index='Account_number', columns='Product',
aggfunc=len, fill_value=0)
Out[11]:
Product A B
Account_number
1 2 0
2 1 2
3 1 1
It looks like count, is counting the instances of each column (Account_number
and Product
), it's not clear to me whether this is a bug...