Applying different aggregate functions to different columns (now that dict with renaming is deprecated)

心已入冬 提交于 2020-01-01 05:29:07

问题


I had asked this question before: python pandas: applying different aggregate functions to different columns but the latest changes to pandas https://github.com/pandas-dev/pandas/pull/15931 mean that what I thought was an elegant and pythonic solution is deprecated, for reasons I genuinely fail to understand.

The question was, and still is: when doing a groupby, how can I apply different aggregate functions to different fields (e.g. sum of x, avg of x, min of y, max of z, etc.) and rename the resulting fields, all in one go, or at least in a possibly pythonic and not-too-cumbersome way? I.e. sum_x won't do, I need to rename the fields explicitly.

This approach, which I liked:

df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
                                "unemp": {"mean_unemp": "mean"}})

will be deprecated and now produces this warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version

Thanks!


回答1:


agg() is not deprecated but renaming using agg is.

Do go through the documentation: https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming

What is deprecated: 1. Passing a dict to a grouped/rolled/resampled Series that allowed one to rename the resulting aggregation 2. Passing a dict-of-dicts to a grouped/rolled/resampled DataFrame.

This will work, though its not a single line of code

df.groupby('qtr').agg({"realgdp": ["mean",  "std"], "unemp": "mean"})

df.columns = df.columns.map('_'.join)

df.rename(columns = {'realgdp_mean': 'mean_gdp', 'realgdp_std':'std_gdp', 'unemp_mean':'mean_unemp'}, inplace = True)


来源:https://stackoverflow.com/questions/46694207/applying-different-aggregate-functions-to-different-columns-now-that-dict-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!