How to loop over grouped Pandas dataframe?

匿名 (未验证) 提交于 2019-12-03 01:55:01

问题:

DataFrame:

  c_os_family_ss c_os_major_is l_customer_id_i 0      Windows 7                         90418 1      Windows 7                         90418 2      Windows 7                         90418 

Code:

print df for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):     print name     print group 

I'm trying to just loop over the aggregated data, but I get the error:

ValueError: too many values to unpack

@EdChum, here's the expected output:

                                                    c_os_family_ss  \ l_customer_id_i 131572           Windows 7,Windows 7,Windows 7,Windows 7,Window... 135467           Windows 7,Windows 7,Windows 7,Windows 7,Window...                                                       c_os_major_is l_customer_id_i 131572           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... 135467           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... 

The output is not the problem, I wish to loop over every group.

回答1:

df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

In general:

  • df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:

    grouped = df.groupby('A')  for name, group in grouped:     ... 
  • When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).



回答2:

You can iterate over the index values if your dataframe has already been created.

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) for name in df.index:     print name     print df.loc[name] 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!