How to efficiently fill a column of dataframe with values calculated from other dataframes

房东的猫 提交于 2019-12-12 17:48:34

问题


I am trying to fill a dataframe (elist) with cumulative returns of firms and cumulative market returns. This can be done by looping through the elist dataframe using iterrows, see this link. However, this is slow.

I am looking for a more potent, faster solution.

The raw returns that are the inputs for the cumulative return calculations originate from two dataframes (ri, rm). The results should then be recorded in columns of elist. See example below, using the data in this file.

Before running the iterrows loop, elist looks like:

    permno  begdat      enddat   return vwretd
0   11628   2012-03-31  2013-03-31  NaN NaN
1   11628   2012-06-30  2013-06-30  NaN NaN
2   11628   2012-09-30  2013-09-30  NaN NaN
3   11628   2012-12-31  2013-12-31  NaN NaN
4   11628   2013-03-31  2014-03-31  NaN NaN

After running the loop elist should look like:

    permno   begdat     enddat      return      vwretd
0   11628   2012-03-31  2013-03-31  0.212355    0.133429
1   11628   2012-06-30  2013-06-30  0.274788    0.198380
2   11628   2012-09-30  2013-09-30  0.243590    0.198079
3   11628   2012-12-31  2013-12-31  0.299277    0.304479
4   11628   2013-03-31  2014-03-31  0.303147    0.208454

Here is the code that relies on iterrows, which is slow:

import os,sys
import pandas as pd
import numpy as np
rm     = pd.read_csv('rm_so.csv') # market return
ri     = pd.read_csv('ri_so.csv') # firm return
elist  = pd.read_csv('elist_so.csv') # table to be filled with cumlative returns over a period (begdat to enddat)
for index, row in elist.iterrows():
    #fill cumulative market return
    elist.loc[index, 'vwretd']=rm.loc[(rm['date']>row['begdat']) & (rm['date']<=row['enddat']),'vwretd'].product()-1    
    #fill cumulative firm return
    r = ri.loc[(ri['permno']==row['permno']),]
    elist.loc[index, 'return'] = r.loc[(r['date']>row['begdat']) & (r['date']<=row['enddat']),'ret'].product()-1

It would be great to see this process running faster!

来源:https://stackoverflow.com/questions/42601765/how-to-efficiently-fill-a-column-of-dataframe-with-values-calculated-from-other

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!