How to lag data by x specific days on a multi index pandas dataframe?

只谈情不闲聊 提交于 2021-02-08 09:58:56

问题


I have a dataframe that has dates, assets, and then price/volume data. I'm trying to pull in data from 7 days ago, but the issue is that I can't use shift() because my table has missing dates in it.

 date   cusip   price   price_7daysago
1/1/2017    a   1   
1/1/2017    b   2   
1/2/2017    a   1.2 
1/2/2017    b   2.3 
1/8/2017    a   1.1         1
1/8/2017    b   2.2         2

I've tried creating a lambda function to try to use loc and timedelta to create this shifting, but I was only able to output empty numpy arrays:

def row_delta(x, df, days, colname):
    if datetime.strptime(x['recorddate'], '%Y%m%d') - timedelta(days) in [datetime.strptime(x,'%Y%m%d') for x in   df['recorddate'].unique().tolist()]:
        return df.loc[(df['recorddate_date'] == df['recorddate_date'] - timedelta(days)) & (df['cusip'] == x['cusip']) ,colname]
    else:
        return 'nothing'

I also thought of doing something similar to this in order to fill in missing dates, but my issue is that I have multiple indexes, the dates and the cusips so I can't just reindex on this.

I'm not really sure what else I can do, but any help would be greatly appreciated!


回答1:


merge the DataFrame with itself while adding 7 days to the date column for the right Frame. Use the suffixes argument to name the columns appropriately.

import pandas as pd

df['date'] = pd.to_datetime(df.date)
df.merge(df.assign(date = df.date+pd.Timedelta(days=7)), 
         on=['date', 'cusip'],
         how='left', suffixes=['', '_7daysago'])

Output: df

        date cusip  price  price_7daysago
0 2017-01-01     a    1.0             NaN
1 2017-01-01     b    2.0             NaN
2 2017-01-02     a    1.2             NaN
3 2017-01-02     b    2.3             NaN
4 2017-01-08     a    1.1             1.0
5 2017-01-08     b    2.2             2.0



回答2:


you can set date and cusip as index and use unstack and shift together

shifted = df.set_index(["date", "cusip"]).unstack().shift(7).stack()

then simply merge shifted with your original df



来源:https://stackoverflow.com/questions/52435070/how-to-lag-data-by-x-specific-days-on-a-multi-index-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!