pandas - pivot_table with non-numeric values? (DataError: No numeric types to aggregate)

后端 未结 2 1960
-上瘾入骨i
-上瘾入骨i 2021-02-20 12:55

I\'m trying to do a pivot of a table containing strings as results.

import pandas as pd

df1 = pd.DataFrame({\'index\' : range(8),
\'variable1\' : [\"A\",\"A\",\         


        
相关标签:
2条回答
  • 2021-02-20 13:39

    I think the best compromise is to replace on/off with True/False, which will enable pandas to "understand" the data better and act in an intelligent, expected way.

    df2 = df1.replace({'on': True, 'off': False})
    

    You essentially conceded this in your question. My answer is, I don't think there's a better way, and you should replace 'on'/'off' anyway for whatever comes next.

    As Andy Hayden points out in the comments, you'll get better performance if you replace on/off with 1/0.

    0 讨论(0)
  • 2021-02-20 13:46

    My original reply was based on Pandas 0.14.1, and since then, many things changed in the pivot_table function (rows --> index, cols --> columns... )

    Additionally, it appears that the original lambda trick I posted no longer works on Pandas 0.18. You have to provide a reducing function (even if it is min, max or mean). But even that seemed improper - because we are not reducing the data set, just transforming it.... So I looked harder at unstack...

    import pandas as pd
    
    df1 = pd.DataFrame({'index' : range(8),
    'variable1' : ["A","A","B","B","A","B","B","A"],
    'variable2' : ["a","b","a","b","a","b","a","b"],
    'variable3' : ["x","x","x","y","y","y","x","y"],
    'result': ["on","off","off","on","on","off","off","on"]})
    
    # these are the columns to end up in the multi-index columns.
    unstack_cols = ['variable1', 'variable2', 'variable3']
    

    First, set an index on the data using the index + the columns you want to stack, then call unstack using the level arg.

    df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)
    

    Resulting dataframe is below.

    0 讨论(0)
提交回复
热议问题