Returning subset of each group from a pandas groupby object

问题

I have the multilevel dataframe that looks like:

                      date_time      name  note   value
list index                                    
1    0     2015-05-22 05:37:59       Tom   129    False
     1     2015-05-22 05:38:59       Tom     0    True
     2     2015-05-22 05:39:59       Tom     0    False
     3     2015-05-22 05:40:59       Tom    45    True
2    4     2015-05-22 05:37:59       Kate   129    True
     5     2015-05-22 05:41:59       Kate     0    False
     5     2015-05-22 05:37:59       Kate     0    True

I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value I use this code, that seems logic:

def delete_first_false():
    for list, new_df in df.groupby(level=0):
        for index, row in new_df.iterrows():
            new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
        return new_df
    return df

but I have this error

AttributeError: '_LocIndexer' object has no attribute 'groupby'

could you explain me what's wrong with my method?

回答1:

Your general approach -- using loops -- rarely works the way you want in pandas.

If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.

Your main goal is the following:

So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the) value (column).

So let's write a simple function to do just that on a single, stand-alone dataframe:

def filter_firstrow_falses(df):
    if not df['value'].iloc[0]:
        return df.iloc[1:]
    else:
        return df

OK. Simple enough.

Now, let's apply that to each group of your real dataframe:

import pandas
from io import StringIO

csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")

df = pandas.read_csv(csv)

final = (
    df.groupby(by=['list']) # create the groupby object
      .apply(filter_firstrow_falses) # apply our function to each group
      .reset_index(drop=True) # clean up the index
)
print(final)


   list            date_time  name  note  value
0     1  2015-05-22 05:38:59   Tom     0   True
1     1  2015-05-22 05:39:59   Tom     0  False
2     1  2015-05-22 05:40:59   Tom    45   True
3     2  2015-05-22 05:37:59  Kate   129   True
4     2  2015-05-22 05:41:59  Kate     0  False
5     2  2015-05-22 05:37:59  Kate     0   True

来源：https://stackoverflow.com/questions/33505339/returning-subset-of-each-group-from-a-pandas-groupby-object

标签

python

pandas

dataframe

multi-level

loc