I am trying to get the chunks of data where there\'s consecutive dates from the Pandas DataFrame. My df
looks like below.
DateAnalyzed
There were similar questions after this one here and here, with more specific outputs requirements. Since this one is more general, I would like to contribute here as well.
We can easily assign an unique identifier to consecutive groups with one-line code:
df['grp_date'] = df.DateAnalyzed.diff().dt.days.ne(1).cumsum()
Here, every time we see a date with a difference greater than a day, we add a value to that date, otherwise it remains with the previous value so that we end up with a unique identifier per group.
See the output:
DateAnalyzed Val grp_date
1 2018-03-18 0.470253 1
2 2018-03-19 0.470253 1
3 2018-03-20 0.470253 1
4 2018-09-25 0.467729 2
5 2018-09-26 0.467729 2
6 2018-09-27 0.467729 2
Now, it's easy to groupby
"grp_date" and do whatever you wanna do with apply
or agg
.
Examples:
# Sum across consecutive days (or any other method from pandas groupby)
df.groupby('grp_date').sum()
# Get the first value and last value per consecutive days
df.groupby('grp_date').apply(lambda x: x.iloc[[0, -1]])
# or df.groupby('grp_date').head(n) for first n days
# Perform custom operation across target-columns
df.groupby('grp_date').apply(lambda x: (x['col1'] + x['col2']) / x['Val'].mean())
# Multiple operations for a target-column
df.groupby('grp_date').Val.agg(['min', 'max', 'mean', 'std'])
# and so on...