pandas groupby - custom function

血红的双手。 提交于 2019-12-23 16:51:45

问题


I have the following dataframe to which I use groupby and sum():

d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data=d)

df.groupby("col1").sum()

This results in the following:

col1 col2   
A   6.0
B   15.0
C   0.0

I want C to show NaN instead of 0 since all of the values for C are NaN. How can I accomplish this? Apply() with a lambda function? Any help would be appreciated.


回答1:


Thanks to @piRSquared, @Alollz, and @anky_91:

You can use without setting index and reset index:

d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data=d)

df.groupby("col1", as_index=False).sum(min_count=1)

Output:

  col1  col2
0    A   6.0
1    B  15.0
2    C   NaN



回答2:


Use this:

df.groupby('col1').apply(pd.DataFrame.sum,skipna=False).reset_index(drop=True)
#Or --> df.groupby('col1',as_index=False).apply(pd.DataFrame.sum,skipna=False)

Without the apply() thanks to @piRSquared:

df.set_index('col1').sum(level=0, min_count=1).reset_index()

thanks @Alollz : If you want to return sum of groups containing NaN and not just NaNs

df.set_index('col1').sum(level=0,min_count=1).reset_index()

Output

  col1  col2
0  AAA   6.0
1  BBB  15.0
2  CCC   NaN



回答3:


make the call to sum have the parameter skipna = False.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html

that link should provide the documentation you need and I expect that will fix your problem.



来源:https://stackoverflow.com/questions/54909004/pandas-groupby-custom-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!