Pandas groupby with categories with redundant nan

后端 未结 5 1017
说谎
说谎 2020-12-05 02:26

I am having issues using pandas groupby with categorical data. Theoretically, it should be super efficient: you are grouping and indexing via integers rather than strings. B

5条回答
  •  醉梦人生
    2020-12-05 03:30

    Since Pandas 0.23.0, the groupby method can now take a parameter observed which fixes this issue if it is set to True (False by default). Below is the exact same code as in the question with just observed=True added :

    import pandas as pd
    
    group_cols = ['Group1', 'Group2', 'Group3']
    
    df = pd.DataFrame([['A', 'B', 'C', 54.34],
                       ['A', 'B', 'D', 61.34],
                       ['B', 'A', 'C', 514.5],
                       ['B', 'A', 'A', 765.4],
                       ['A', 'B', 'D', 765.4]],
                      columns=(group_cols+['Value']))
    
    for col in group_cols:
        df[col] = df[col].astype('category')
    
    df.groupby(group_cols, as_index=False, observed=True).sum()
    

提交回复
热议问题