Pivot a pandas DataFrame to be the correct format: `DataError: No numeric types to aggregate`

后端 未结 4 1797
青春惊慌失措
青春惊慌失措 2020-12-21 06:47

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd

data = {\"grouping\": [\"item1\", \"item1\", \"item1\", \"item2\", \"item2\", \"         


        
4条回答
  •  生来不讨喜
    2020-12-21 07:15

    There are four idiomatic pandas ways to do this.

    • No duplicates among grouping columns. Does not require aggregation
      • pivot
      • set_index
    • Duplicates among grouping columns. Does require aggregation
      • pivot_table
      • groupby

    pivot

    df.pivot('grouping', 'labels', 'count')
    

    set_index

    df.set_index(['grouping', 'labels'])['count'].unstack()
    

    pivot_table

    df.pivot_table('count', 'grouping', 'labels')
    

    groupby

    df.groupby(['grouping', 'labels'])['count'].sum().unstack()
    

    All yield

    labels      A      B      C    D
    grouping                        
    item1     5.0    1.0    8.0  NaN
    item2     3.0  731.0  189.0  9.0
    

    timing

    With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0

    df.pivot_table('count', 'grouping', 'labels', fill_value=0)
    
    df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
    
    df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
    

    All yield

    labels    A    B    C  D
    grouping                
    item1     5    1    8  0
    item2     3  731  189  9
    

    Additional thoughts on groupby

    Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.

    df.groupby(['grouping', 'labels'])['count'].max().unstack()
    

    or

    df.groupby(['grouping', 'labels'])['count'].first().unstack()
    

    timing groupby

提交回复
热议问题