问题
I have a data frame called df
:
Date Sales
01/01/2020 812
02/01/2020 981
03/01/2020 923
04/01/2020 1033
05/01/2020 988
... ...
How can I get the first occurrence of 7 consecutive days with sales above 1000?
This is what I am doing to find the rows where sales is above 1000:
In [221]: df.loc[df["sales"] >= 1000]
Out [221]:
Date Sales
04/01/2020 1033
08/01/2020 1008
09/01/2020 1091
17/01/2020 1080
18/01/2020 1121
19/01/2020 1098
... ...
回答1:
You can assign a unique identifier per consecutive days, group by them, and return the first value per group (with a previous filter of values > 1000):
df = df.query('Sales > 1000').copy()
df['grp_date'] = df.Date.diff().dt.days.fillna(1).ne(1).cumsum()
df.groupby('grp_date').head(7).reset_index(drop=True)
where you can change the value of head
parameter to the first n
rows from consecutive days.
Note: you may need to use pd.to_datetime(df.Date, format='%d/%m/%Y')
to convert dates from strings to pandas datetime, and sort them.
回答2:
Couldn't you just sort by date and grab head 7?
df = df.sort_values('Date')
df.loc[df["sales"] >= 1000].head(7)
If you need the original maybe make a copy first
来源:https://stackoverflow.com/questions/65146320/pandas-how-to-get-rows-with-consecutive-dates-and-sales-more-than-1000