问题
I am trying to convert a daily frequency dataframe to minute data, and in a previous post it was suggested to use the ffil method below but it does not seem to work with dataframes that consist of only 2 rows (Conversion of Daily pandas dataframe to minute frequency).
So the below dataframe is supposed to be converted.
import pandas as pd
dict = [
{'ticker':'jpm','date': '2016-11-28','returns': 0.2},
{ 'ticker':'ge','date': '2016-11-28','returns': 0.2},
{'ticker':'fb', 'date': '2016-11-28','returns': 0.2},
{'ticker':'aapl', 'date': '2016-11-28','returns': 0.2},
{'ticker':'msft','date': '2016-11-28','returns': 0.2},
{'ticker':'amzn','date': '2016-11-28','returns': 0.2},
{'ticker':'jpm','date': '2016-11-29','returns': 0.2},
{'ticker':'ge', 'date': '2016-11-29','returns': 0.2},
{'ticker':'fb','date': '2016-11-29','returns': 0.2},
{'ticker':'aapl','date': '2016-11-29','returns': 0.2},
{'ticker':'msft','date': '2016-11-29','returns': 0.2},
{'ticker':'amzn','date': '2016-11-29','returns': 0.2}
]
df = pd.DataFrame(dict)
df['date'] = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)
This works on the entire dataframe:
df_min = df.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()
But when I work with a smaller dataframe it returns an empty dataframe for some reason:
df2=df.iloc[0:2,:]
df2_min = df2.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()
Does anyone have an explanation for this odd behaviour?
edt: I noticed the code only works if the dataframe has at least 7 rows.
回答1:
If you have only 2 row input DataFrame then after reshape by unstack get one row DataFrame and pandas cannot create continous minute DataFrame, because only one value of DatetimeIndex.
Possible solution is add next day after reshape, fill it last previous row data, apply solution and in last steps remove last helper row by positions with iloc:
df2=df.iloc[0:2]
print (df2)
returns
date ticker
2016-11-28 jpm 0.2
ge 0.2
df3 = df2.unstack()
print (df3)
ticker jpm ge
date
2016-11-28 0.2 0.2
df3.loc[df3.index.max() + pd.Timedelta(1, unit='d')] = df3.iloc[-1]
print (df3)
returns
ticker jpm ge
date
2016-11-28 0.2 0.2
2016-11-29 0.2 0.2 <- helper row
df_min = df3.asfreq('Min', method='ffill')
print (df_min.tail())
returns
ticker jpm ge
date
2016-11-28 23:56:00 0.2 0.2
2016-11-28 23:57:00 0.2 0.2
2016-11-28 23:58:00 0.2 0.2
2016-11-28 23:59:00 0.2 0.2
2016-11-29 00:00:00 0.2 0.2 <- helper row
df_min = df_min.iloc[:-1].between_time('8:30','16:00').stack()
#print (df_min)
来源:https://stackoverflow.com/questions/56895049/conversion-of-daily-pandas-dataframe-to-minute-frequency-does-not-work-for-2-row