Including the group name in the apply function pandas python

心已入冬 提交于 2020-05-25 09:34:35

问题


Is there away to specify to the groupby() call to use the group name in the apply() lambda function?

Similar to if I iterate through groups I can get the group key via the following tuple decomposition:

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):
    print group_name

...is there a way to also get the group name in the apply function, such as:

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)

How can I get the group name as an argument for the apply lambda function?


回答1:


I think you should be able to use the nameattribute:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

should work, example:

In [132]:
df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})
df

Out[132]:
   a  b
0  a  0
1  a  1
2  b  2
3  c  3
4  c  4
5  c  5

In [134]:
df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))

name: a 
subdf:    a  b
0  a  0
1  a  1
name: b 
subdf:    a  b
2  b  2
name: c 
subdf:    a  b
3  c  3
4  c  4
5  c  5
Out[134]:
Empty DataFrame
Columns: []
Index: []



回答2:


For those who came looking for an answer to the question:

Including the group name in the transform function pandas python

and ended up in this thread, please read on.

Given the following input:

df = pd.DataFrame(data={'col1': list('aabccc'),
                        'col2': np.arange(6),
                        'col3': np.arange(6)})

Data:

    col1    col2    col3
0   a       0       0
1   a       1       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

We can access the group name (which is visible from the scope of the calling apply function) like this:

df.groupby('a') \
.apply(lambda frame: frame \
       .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'b' else col))

Output:

    col1    col2    col3
0   a       3       0
1   a       4       1
2   b       2       2
3   c       3       3
4   c       4       4
5   c       5       5

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

Alternatively, one could also loop over the groups and then, within each group, over the columns:

for grp_name, sub_df in df.groupby('col1'):
    for col in sub_df:
        if grp_name == 'a' and col == 'col2':
            df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.



来源:https://stackoverflow.com/questions/32460593/including-the-group-name-in-the-apply-function-pandas-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!