Pandas groupby with categories with redundant nan

后端未结

关注

 5  1023

说谎 2020-12-05 02:26

I am having issues using pandas groupby with categorical data. Theoretically, it should be super efficient: you are grouping and indexing via integers rather than strings. B

5条回答

悲&欢浪女 (楼主)

2020-12-05 03:06
I found the behavior similar to what's documented in the operations section of Categorical Data.

In particular, similar to
```
In [121]: cats2 = pd.Categorical(["a","a","b","b"], categories=["a","b","c"])

In [122]: df2 = pd.DataFrame({"cats":cats2,"B":["c","d","c","d"], "values":[1,2,3,4]})

In [123]: df2.groupby(["cats","B"]).mean()
Out[123]: 
        values
cats B        
a    c     1.0
     d     2.0
b    c     3.0
     d     4.0
c    c     NaN
     d     NaN
```
Some other words describing the related behavior in Series and groupby. There is also a pivot table example in the end of the section.

Apart from Series.min(), Series.max() and Series.mode(), the following operations are possible with categorical data:

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data:

Groupby will also show “unused” categories:

The words and the example are cited from Categorical Data.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...