Pandas ffill resampled data grouped by column

北战南征 提交于 2019-12-13 21:03:23

问题


I'm trying to create a data frame from a start date and end date, for a number of asset_id's and turn it into a list of half-hours for each asset_id between the start and end date with the values of some_property filled forward.

I've tried Grouped and resample from the documentation and examples from SO but am stumped how to get this done.

Consider example:

some_time = datetime(2018,4,2,20,20,42)
start_date = datetime(some_time.year,some_time.month,some_time.day).astimezone(pytz.timezone('Europe/London'))
end_date = start_date + timedelta(days=1)
start_date = start_date + timedelta(hours=some_time.hour,minutes=(0 if some_time.minute < 30 else 30 ))
df = pd.DataFrame(['A','B'],columns=['asset_id'])
df2=df.copy()
df['HH'] = start_date
df2['HH'] = end_date
df['some_property']=0
df.loc[df['asset_id']=='B','some_property']=2
df = df.append(df2)

print(df)

Example df:

                          HH asset_id  some_property
0  2018-04-02 20:00:00+01:00        A            0.0
1  2018-04-02 20:00:00+01:00        B            2.0
0  2018-04-03 00:00:00+01:00        A            NaN
1  2018-04-03 00:00:00+01:00        B            NaN

My desired df is this:

                           HH asset_id  some_property
0   2018-04-02 20:00:00+01:00        A              0
4   2018-04-02 20:30:00+01:00        A              0
6   2018-04-02 21:00:00+01:00        A              0
...
2   2018-04-03 00:00:00+01:00        A              0
1   2018-04-02 20:00:00+01:00        B              2
5   2018-04-02 20:30:00+01:00        B              2
...
3   2018-04-03 00:00:00+01:00        B              2

example by dumb for-loop as below.

for ii in range(1,((24-some_time.hour)*2)-1):
    temp_time = start_date + timedelta(minutes=30*ii)
    for jj in ['A','B']:
        df2 = df.append({'asset_id':jj,'HH':temp_time,'some_property':(0 if jj=='A' else 2)},ignore_index=True)
        df = df2
df.loc[2,'some_property']=0
df.loc[3,'some_property']=2

Can anyone help? I'm trying to create dummy data and having trouble. In reality my dummy data will have multiple rows per asset_id before resampling and ffill-ing.

Update

This might be on the right track:

df2 = df.set_index('HH').groupby('asset_id').resample('30T').ffill().drop('asset_id',axis=1)

gives me:

                                   some_property
asset_id HH                                      
A        2018-04-02 20:00:00+01:00            0.0
         2018-04-02 20:30:00+01:00            0.0
         2018-04-02 21:00:00+01:00            0.0
...
         2018-04-03 00:00:00+01:00            NaN
B        2018-04-02 20:00:00+01:00            2.0
         2018-04-02 20:30:00+01:00            2.0
...
         2018-04-03 00:00:00+01:00            NaN

I'm happy to have gained a multindex and can set the last datetime manually (or can exclude row actually).

Does this look like a sensible method? If so, question answered!


回答1:


This isn't quite what my example answer gave but it will do!

df2 = df.set_index('HH').groupby('asset_id').resample('30T').ffill().drop('asset_id',axis=1)

(Got close to this before but typo meant I thought the approach didn't work!)



来源:https://stackoverflow.com/questions/49613440/pandas-ffill-resampled-data-grouped-by-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!