问题
Often when we perform groupby
operations using pandas we may wish to apply several functions across multiple series.
groupby.agg seems the natural way to perform these groupings and calculations.
However, there seems to be discrepancy between how groupby.agg
and groupby.apply
are implemented, because I cannot group to a list using agg
. Tuple and set works fine, which suggests to me you can only aggregate to immutable types via agg
. Via groupby.apply
, I can aggregate one series to a list directly with no issues.
Below is a complete example. Functions (1), (2), (3) complete successfully. (4) comes back with # ValueError: Function does not reduce
.
import pandas as pd
df = pd.DataFrame([['Bob', '1/1/18', 'AType', 'blah', 'test', 'test2'],
['Bob', '1/1/18', 'AType', 'blah2', 'test', 'test3'],
['Bob', '1/1/18', 'BType', 'blah', 'test', 'test2']],
columns=['NAME', 'DATE', 'TYPE', 'VALUE A', 'VALUE B', 'VALUE C'])
def grouper(df, func):
f = {'VALUE A': lambda x: func(x), 'VALUE B': 'last', 'VALUE C': 'last'}
return df.groupby(['NAME', 'DATE', 'TYPE'])['VALUE A', 'VALUE B', 'VALUE C']\
.agg(f).reset_index()
# (1) SUCCESS
grouper(df, set)
# (2) SUCCESS
grouper(df, tuple)
# (3) SUCCESS
df.groupby(['NAME', 'DATE', 'TYPE', 'VALUE B', 'VALUE C'])['VALUE A']\
.apply(list).reset_index()
# (4) FAIL
grouper(df, list)
# AttributeError
# ValueError: Function does not reduce
回答1:
After much investigation, I have discovered this is a bug, which will be fixed in a future release of pandas.
The offending code in 0.22.x groupby.py, notice the isinstance(res, list)
:
def _aggregate_series_pure_python(self, obj, func):
group_index, _, ngroups = self.group_info
counts = np.zeros(ngroups, dtype=int)
result = None
splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)
for label, group in splitter:
res = func(group)
if result is None:
if (isinstance(res, (Series, Index, np.ndarray)) or
isinstance(res, list)):
raise ValueError('Function does not reduce')
result = np.empty(ngroups, dtype='O')
counts[label] = group.shape[0]
result[label] = res
result = lib.maybe_convert_objects(result, try_float=0)
return result, counts
Master branch of groupby.py, isinstance(res, list)
omitted:
def _aggregate_series_pure_python(self, obj, func):
group_index, _, ngroups = self.group_info
counts = np.zeros(ngroups, dtype=int)
result = None
splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)
for label, group in splitter:
res = func(group)
if result is None:
if (isinstance(res, (Series, Index, np.ndarray))):
raise ValueError('Function does not reduce')
result = np.empty(ngroups, dtype='O')
counts[label] = group.shape[0]
result[label] = res
result = lib.maybe_convert_objects(result, try_float=0)
return result, counts
来源:https://stackoverflow.com/questions/48910956/pandas-agg-to-list-attributeerror-valueerror-function-does-not-reduce