问题
I have monthly dataset
df = pd.DataFrame({'Month':[1,2],
'Plan':[310,620],
'Month_start_date': ['2020-01-01','2020-02-01']})
print(df)
df['Month_start_date'] = (pd.to_datetime(df['Month_start_date'], format='%Y/%m/%d')
.dt.to_period('m').dt.to_timestamp())
df = df.set_index('Month_start_date')
I created a list of dates in a format i would like to reindex
start = '2020-01-01'
end = '2020-02-29'
dates = pd.date_range(start, end, freq='D')
dates
when i try to change the dataframe to daily using this code
df_daily = df.reindex(dates, method='ffill')
print(df_daily)
This is the output i get
Month Plan
2020-01-01 1 310
2020-01-02 1 310
2020-01-03 1 310
2020-01-04 1 310
2020-01-05 1 310
2020-01-06 1 310
2020-01-07 1 310
2020-01-08 1 310
2020-01-09 1 310
2020-01-10 1 310
...
The list goes on till Feb 29th as expected. However plan remains same for everyday. How can i make it look like this?
Month Plan
2020-01-01 1 10
2020-01-02 1 10
2020-01-03 1 10
2020-01-04 1 10
2020-01-05 1 10
2020-01-06 1 10
2020-01-07 1 10
2020-01-08 1 10
2020-01-09 1 10
2020-01-10 1 10
...
2020-02-17 2 21.38
2020-02-18 2 21.38
2020-02-19 2 21.38
2020-02-20 2 21.38
2020-02-21 2 21.38
2020-02-22 2 21.38
2020-02-23 2 21.38
2020-02-24 2 21.38
2020-02-25 2 21.38
2020-02-26 2 21.38
2020-02-27 2 21.38
2020-02-28 2 21.38
2020-02-29 2 21.38
Just divide the plan between all the dates evenly by dividing it by number of days in the month. Since the Feb has 620 as its plan, every day gets 620/29 which is 21.38
回答1:
Pandas has a function for the number of days in a month:
df_daily["Daily plan"] = df_daily["Plan"] / df_daily.index.daysinmonth
回答2:
Keldorn's method is better, if you have some convenient helper function to tell you the length of each period. But here's the more general approach using groupby():
# EITHER OF THESE:
df.reindex(dates, method='ffill').groupby('Month').transform(lambda x: x/x.size)
df.reindex(dates, method='ffill').groupby('Month').transform(lambda x: x/len(x))
Plan
2020-01-01 10.00000
2020-01-02 10.00000
...
2020-01-31 10.00000
2020-02-01 21.37931
2020-02-02 21.37931
...
2020-02-29 21.37931
and you could assign the output to df['Plan'] or df['Plan_daily'] or whatever.
来源:https://stackoverflow.com/questions/61517215/changing-monthly-values-to-daily-by-evenly-distributing-between-dates