问题
I have a pandas dataframe that has contracts start and end date and a quantity. How would I strip out the individual months so they can be aggregated and graphed.
ex
Start Date End Date Demanded Customer
1/1/2017 3/31/2017 100 A
2/1/2017 3/31/2017 50 B
strip out the months to the following
Month Demand Customer
1/1/2017 100 A
2/1/2017 100 A
3/1/2017 100 A
2/1/2017 50 B
3/1/2017 50 B
End result is to pivot this and then graph with months on the x-axis and total demand on the y-axis
回答1:
You can first convert columns with dates to_datetime. Then use itertuples and date_range with frequency MS
(start of month) with concat for creating new expanding DataFrame
. Last join original columns Quantity Demanded
and Customer
:
df['Start_Date'] = pd.to_datetime(df['Start Date'])
df['End_Date'] = pd.to_datetime(df['End Date'])
df1 = pd.concat([pd.Series(r.Index,
pd.date_range(r.Start_Date, r.End_Date, freq='MS'))
for r in df.itertuples()])
.reset_index()
df1.columns = ['Month','idx']
print (df1)
Month idx
0 2017-01-01 0
1 2017-02-01 0
2 2017-03-01 0
3 2017-02-01 1
4 2017-03-01 1
df2 = df1.set_index('idx').join(df[['Quantity Demanded','Customer']]).reset_index(drop=True)
print (df2)
Month Quantity Demanded Customer
0 2017-01-01 100 A
1 2017-02-01 100 A
2 2017-03-01 100 A
3 2017-02-01 50 B
4 2017-03-01 50 B
回答2:
Using melt
then resample('MS')
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])
d1 = pd.melt(
df, ['Demanded', 'Customer'],
['Start Date', 'End Date'],
value_name='Date'
).drop('variable', 1).set_index('Date')
d1.groupby('Customer').apply(lambda df: df.resample('MS').ffill()) \
.reset_index(0, drop=True) \
.reset_index()
Date Demanded Customer
0 2017-01-01 100 A
1 2017-02-01 100 A
2 2017-03-01 100 A
3 2017-02-01 50 B
4 2017-03-01 50 B
来源:https://stackoverflow.com/questions/41902835/strip-out-months-from-two-date-columns