Pandas 0.25.0: groupby on categoricals

て烟熏妆下的殇ゞ 提交于 2019-12-11 05:22:06

问题


I have some difficulties on using Pandas 0.25.0, which is released last month.

Consider this date frame:

df = pd.DataFrame({
    'A': pd.Series(['a', 'b', 'b', 'a'], dtype='category'),
    'B': pd.Series(['m', 'o', 'o', 'o']),
    'C': pd.Series([1, 2, 3, 4]),
})

Say we want to groupby on the first two columns. The resulting data frame should contain 3 rows, since the combination b m doesn't exist.

df.groupby(['A', 'B']).agg({'C': 'sum'})

In Pandas 0.24.1 and earlier, this works fine:

     C
A B   
a m  1
  o  4
b o  5

However, in Pandas 0.25.0 this is broken:

       C
A B     
a m  1.0
  o  4.0
b m  NaN
  o  5.0

I know I can suppress this unwanted behaviour by adding observed=True to the groupby call, but that was not neccessary in the old version. I cannot find anything related in the release notes.

How come? Is this a bug in pandas? Did I miss something?


回答1:


Thanks to the comment of ALollz I think I know what happend:

I (unknowingly) relied on a bug in 0.24, and that is why the update to 0.25 broke my code.



来源:https://stackoverflow.com/questions/57559298/pandas-0-25-0-groupby-on-categoricals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!