问题
I have the multilevel dataframe that looks like:
date_time name note value
list index
1 0 2015-05-22 05:37:59 Tom 129 False
1 2015-05-22 05:38:59 Tom 0 True
2 2015-05-22 05:39:59 Tom 0 False
3 2015-05-22 05:40:59 Tom 45 True
2 4 2015-05-22 05:37:59 Kate 129 True
5 2015-05-22 05:41:59 Kate 0 False
5 2015-05-22 05:37:59 Kate 0 True
I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value
I use this code, that seems logic:
def delete_first_false():
for list, new_df in df.groupby(level=0):
for index, row in new_df.iterrows():
new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
return new_df
return df
but I have this error
AttributeError: '_LocIndexer' object has no attribute 'groupby'
could you explain me what's wrong with my method?
回答1:
Your general approach -- using loops -- rarely works the way you want in pandas.
If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.
Your main goal is the following:
So the final goal is to delete all the first rows in (each group defined by )
listthat haveFalsein (the)value(column).
So let's write a simple function to do just that on a single, stand-alone dataframe:
def filter_firstrow_falses(df):
if not df['value'].iloc[0]:
return df.iloc[1:]
else:
return df
OK. Simple enough.
Now, let's apply that to each group of your real dataframe:
import pandas
from io import StringIO
csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")
df = pandas.read_csv(csv)
final = (
df.groupby(by=['list']) # create the groupby object
.apply(filter_firstrow_falses) # apply our function to each group
.reset_index(drop=True) # clean up the index
)
print(final)
list date_time name note value
0 1 2015-05-22 05:38:59 Tom 0 True
1 1 2015-05-22 05:39:59 Tom 0 False
2 1 2015-05-22 05:40:59 Tom 45 True
3 2 2015-05-22 05:37:59 Kate 129 True
4 2 2015-05-22 05:41:59 Kate 0 False
5 2 2015-05-22 05:37:59 Kate 0 True
来源:https://stackoverflow.com/questions/33505339/returning-subset-of-each-group-from-a-pandas-groupby-object