Lag values and differences in pandas dataframe with missing quarterly data

旧巷老猫 提交于 2019-12-12 02:22:49

问题


Though Pandas has time series functionality, I am still struggling with dataframes that have incomplete time series data.

See the pictures below, the lower picture has complete data, the upper has gaps. Both pics show correct values. In red are the columns that I want to calculate using the data in black. Column Cumm_Issd shows the accumulated issued shares during the year, MV is market value.

I want to calculate the issued shares per quarter (IssdQtr), the quarterly change in Market Value (D_MV_Q) and the MV of last year (L_MV_Y).

See for underlying cvs data this link for the full data and this link for the gapped data. There are two firms 1020180 and 1020201.

However, when I try Pandas shift method it fails when there are gaps, try yourself using the csv files and the code below. All columns (DiffEq, Dif1MV, Lag4MV) differ - for some quarters - from IssdQtr, D_MV_Q, L_MV_Y, respectively.

Are there ways to deal with gaps in data using Pandas?

import pandas as pd
import numpy as np
import os

dfg = pd.read_csv('example_soverflow_gaps.csv',low_memory=False)
dfg['date']     = pd.to_datetime(dfg['Period'], format='%Y%m%d')
dfg['Q']        = pd.DatetimeIndex(dfg['date']).to_period('Q')
dfg['year']     = dfg['date'].dt.year

dfg['DiffEq']    = dfg.sort_values(['Q']).groupby(['Firm','year'])['Cumm_Issd'].diff()
dfg['Dif1MV']    = dfg.groupby(['Firm'])['MV'].diff(1)
dfg['Lag4MV']    = dfg.groupby(['Firm'])['MV'].shift(4)

Gapped data:

Full data:


回答1:


Solved the basic problem by using a merge. First, create a variable that shows the lagged date or quarter. Here we want last year's MV (4 quarters back):

from pandas.tseries.offsets import QuarterEnd 
dfg['lagQ'] = dfg['date'] + QuarterEnd(-4)

Then create a data-frame with the keys (Firm and date) and the relevant variable (here MV).

lagset=dfg[['Firm','date', 'MV']].copy()
lagset.rename(columns={'MV':'Lag_MV', 'date':'lagQ'}, inplace=True)

Lastly, merge the new frame into the existing one:

dfg=pd.merge(dfg, lagset,  on=['Firm', 'lagQ'], how='left')


来源:https://stackoverflow.com/questions/42894968/lag-values-and-differences-in-pandas-dataframe-with-missing-quarterly-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!