Returning subset of each group from a pandas groupby object

半城伤御伤魂 提交于 2019-12-24 15:00:27

问题


I have the multilevel dataframe that looks like:

                      date_time      name  note   value
list index                                    
1    0     2015-05-22 05:37:59       Tom   129    False
     1     2015-05-22 05:38:59       Tom     0    True
     2     2015-05-22 05:39:59       Tom     0    False
     3     2015-05-22 05:40:59       Tom    45    True
2    4     2015-05-22 05:37:59       Kate   129    True
     5     2015-05-22 05:41:59       Kate     0    False
     5     2015-05-22 05:37:59       Kate     0    True

I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value I use this code, that seems logic:

def delete_first_false():
    for list, new_df in df.groupby(level=0):
        for index, row in new_df.iterrows():
            new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
        return new_df
    return df

but I have this error

AttributeError: '_LocIndexer' object has no attribute 'groupby'

could you explain me what's wrong with my method?


回答1:


Your general approach -- using loops -- rarely works the way you want in pandas.

If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.

Your main goal is the following:

So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the) value (column).

So let's write a simple function to do just that on a single, stand-alone dataframe:

def filter_firstrow_falses(df):
    if not df['value'].iloc[0]:
        return df.iloc[1:]
    else:
        return df

OK. Simple enough.

Now, let's apply that to each group of your real dataframe:

import pandas
from io import StringIO

csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")

df = pandas.read_csv(csv)

final = (
    df.groupby(by=['list']) # create the groupby object
      .apply(filter_firstrow_falses) # apply our function to each group
      .reset_index(drop=True) # clean up the index
)
print(final)


   list            date_time  name  note  value
0     1  2015-05-22 05:38:59   Tom     0   True
1     1  2015-05-22 05:39:59   Tom     0  False
2     1  2015-05-22 05:40:59   Tom    45   True
3     2  2015-05-22 05:37:59  Kate   129   True
4     2  2015-05-22 05:41:59  Kate     0  False
5     2  2015-05-22 05:37:59  Kate     0   True


来源:https://stackoverflow.com/questions/33505339/returning-subset-of-each-group-from-a-pandas-groupby-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!