Here is a pandas DataFrame I would like to manipulate:
import pandas as pd
data = {\"grouping\": [\"item1\", \"item1\", \"item1\", \"item2\", \"item2\", \"
There are four idiomatic pandas ways to do this.
pivotset_indexpivot_tablegroupbypivot
df.pivot('grouping', 'labels', 'count')
set_index
df.set_index(['grouping', 'labels'])['count'].unstack()
pivot_table
df.pivot_table('count', 'grouping', 'labels')
groupby
df.groupby(['grouping', 'labels'])['count'].sum().unstack()
All yield
labels A B C D
grouping
item1 5.0 1.0 8.0 NaN
item2 3.0 731.0 189.0 9.0
timing
With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0
df.pivot_table('count', 'grouping', 'labels', fill_value=0)
df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
All yield
labels A B C D
grouping
item1 5 1 8 0
item2 3 731 189 9
Additional thoughts on groupby
Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.
df.groupby(['grouping', 'labels'])['count'].max().unstack()
or
df.groupby(['grouping', 'labels'])['count'].first().unstack()
timing groupby