Pandas Groupby Agg Function Does Not Reduce

后端未结

关注

 2  578

I am using an aggregation function that I have used in my work for a long time now. The idea is that if the Series passed to the function is of length 1 (i.e. the group only

相关标签:

2条回答

不知归路

2020-12-01 18:36

This is a misfeature in DataFrame. If the aggregator returns a list for the first group, it will fail with the error you mention; if it returns a non-list (non-Series) for the first group, it will work fine. The broken code is in groupby.py:

def _aggregate_series_pure_python(self, obj, func):

    group_index, _, ngroups = self.group_info

    counts = np.zeros(ngroups, dtype=int)
    result = None

    splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)

    for label, group in splitter:
        res = func(group)
        if result is None:
            if (isinstance(res, (Series, Index, np.ndarray)) or
                    isinstance(res, list)):
                raise ValueError('Function does not reduce')
            result = np.empty(ngroups, dtype='O')

        counts[label] = group.shape[0]
        result[label] = res

Notice that if result is None and isinstance(res, list. Your options are:

Fake out groupby().agg(), so it doesn't see a list for the first group, or
Do the aggregation yourself, using code like that above but without the erroneous test.

0 讨论(0)

独厮守ぢ

2020-12-01 18:52

I can't really explain you why, but from my experience list in pandas.DataFrame don't work all that well.

I usually use tuple instead. That will work:

def MakeList(x):
    T = tuple(x)
    if len(T) > 1:
        return T
    else:
        return T[0]

DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

     date line_code           s.m.v.
0  2013-04-02    401101   (7.76, 25.564)
1  2013-04-02    401102           25.564
2  2013-04-02    401103             9.55
3  2013-04-02    401104             4.87
4  2013-04-02    401105   (7.76, 25.564)
5  2013-04-02    401106  (5.282, 25.564)
6  2013-04-02    401107            5.282

0 讨论(0)