Pandas `agg` to list, “AttributeError / ValueError: Function does not reduce”

拟墨画扇 提交于 2019-12-11 04:05:14

问题


Often when we perform groupby operations using pandas we may wish to apply several functions across multiple series.

groupby.agg seems the natural way to perform these groupings and calculations.

However, there seems to be discrepancy between how groupby.agg and groupby.apply are implemented, because I cannot group to a list using agg. Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg. Via groupby.apply, I can aggregate one series to a list directly with no issues.

Below is a complete example. Functions (1), (2), (3) complete successfully. (4) comes back with # ValueError: Function does not reduce.

import pandas as pd

df = pd.DataFrame([['Bob', '1/1/18', 'AType', 'blah', 'test', 'test2'],
                   ['Bob', '1/1/18', 'AType', 'blah2', 'test', 'test3'],
                   ['Bob', '1/1/18', 'BType', 'blah', 'test', 'test2']],
                  columns=['NAME', 'DATE', 'TYPE', 'VALUE A', 'VALUE B', 'VALUE C'])


def grouper(df, func):
    f = {'VALUE A': lambda x: func(x), 'VALUE B': 'last', 'VALUE C': 'last'}
    return df.groupby(['NAME', 'DATE', 'TYPE'])['VALUE A', 'VALUE B', 'VALUE C']\
             .agg(f).reset_index()

# (1) SUCCESS
grouper(df, set)

# (2) SUCCESS
grouper(df, tuple)

# (3) SUCCESS
df.groupby(['NAME', 'DATE', 'TYPE', 'VALUE B', 'VALUE C'])['VALUE A']\
  .apply(list).reset_index()

# (4) FAIL
grouper(df, list)

# AttributeError
# ValueError: Function does not reduce

回答1:


After much investigation, I have discovered this is a bug, which will be fixed in a future release of pandas.

The offending code in 0.22.x groupby.py, notice the isinstance(res, list):

def _aggregate_series_pure_python(self, obj, func):

    group_index, _, ngroups = self.group_info

    counts = np.zeros(ngroups, dtype=int)
    result = None

    splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)

    for label, group in splitter:
        res = func(group)
        if result is None:
            if (isinstance(res, (Series, Index, np.ndarray)) or
                    isinstance(res, list)):
                raise ValueError('Function does not reduce')
            result = np.empty(ngroups, dtype='O')

        counts[label] = group.shape[0]
        result[label] = res

    result = lib.maybe_convert_objects(result, try_float=0)
    return result, counts

Master branch of groupby.py, isinstance(res, list) omitted:

def _aggregate_series_pure_python(self, obj, func):

        group_index, _, ngroups = self.group_info

        counts = np.zeros(ngroups, dtype=int)
        result = None

        splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)

        for label, group in splitter:
            res = func(group)
            if result is None:
                if (isinstance(res, (Series, Index, np.ndarray))):
                    raise ValueError('Function does not reduce')
                result = np.empty(ngroups, dtype='O')

            counts[label] = group.shape[0]
            result[label] = res

        result = lib.maybe_convert_objects(result, try_float=0)
        return result, counts


来源:https://stackoverflow.com/questions/48910956/pandas-agg-to-list-attributeerror-valueerror-function-does-not-reduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!