KDB+ like asof join for timeseries data in pandas?

点点圈 提交于 2019-11-30 01:21:46

As you mentioned in the question, looping through each column should work for you:

df1.apply(lambda x: x.asof(df2.index))

We could potentially create a faster NaN-naive version of DataFrame.asof to do all the columns in one shot. But for now, I think this is the most straightforward way.

I wrote an under-advertised ordered_merge function some time ago:

In [27]: quotes
Out[27]: 
                        time    bid    ask  bsize  asize
0 2012-09-06 09:30:00.026000  13.34  13.44      3     16
1 2012-09-06 09:30:00.043000  13.34  13.44      3     17
2 2012-09-06 09:30:00.121000  13.36  13.65      1     10
3 2012-09-06 09:30:00.386000  13.36  13.52     21      1
4 2012-09-06 09:30:00.440000  13.40  13.44     15     17

In [28]: trades
Out[28]: 
                        time  price   size
0 2012-09-06 09:30:00.439000  13.42  60511
1 2012-09-06 09:30:00.439000  13.42  60511
2 2012-09-06 09:30:02.332000  13.42    100
3 2012-09-06 09:30:02.332000  13.42    100
4 2012-09-06 09:30:02.333000  13.41    100

In [29]: ordered_merge(quotes, trades)
Out[29]: 
                        time    bid    ask  bsize  asize  price   size
0 2012-09-06 09:30:00.026000  13.34  13.44      3     16    NaN    NaN
1 2012-09-06 09:30:00.043000  13.34  13.44      3     17    NaN    NaN
2 2012-09-06 09:30:00.121000  13.36  13.65      1     10    NaN    NaN
3 2012-09-06 09:30:00.386000  13.36  13.52     21      1    NaN    NaN
4 2012-09-06 09:30:00.439000    NaN    NaN    NaN    NaN  13.42  60511
5 2012-09-06 09:30:00.439000    NaN    NaN    NaN    NaN  13.42  60511
6 2012-09-06 09:30:00.440000  13.40  13.44     15     17    NaN    NaN
7 2012-09-06 09:30:02.332000    NaN    NaN    NaN    NaN  13.42    100
8 2012-09-06 09:30:02.332000    NaN    NaN    NaN    NaN  13.42    100
9 2012-09-06 09:30:02.333000    NaN    NaN    NaN    NaN  13.41    100

In [32]: ordered_merge(quotes, trades, fill_method='ffill')
Out[32]: 
                        time    bid    ask  bsize  asize  price   size
0 2012-09-06 09:30:00.026000  13.34  13.44      3     16    NaN    NaN
1 2012-09-06 09:30:00.043000  13.34  13.44      3     17    NaN    NaN
2 2012-09-06 09:30:00.121000  13.36  13.65      1     10    NaN    NaN
3 2012-09-06 09:30:00.386000  13.36  13.52     21      1    NaN    NaN
4 2012-09-06 09:30:00.439000  13.36  13.52     21      1  13.42  60511
5 2012-09-06 09:30:00.439000  13.36  13.52     21      1  13.42  60511
6 2012-09-06 09:30:00.440000  13.40  13.44     15     17  13.42  60511
7 2012-09-06 09:30:02.332000  13.40  13.44     15     17  13.42    100
8 2012-09-06 09:30:02.332000  13.40  13.44     15     17  13.42    100
9 2012-09-06 09:30:02.333000  13.40  13.44     15     17  13.41    100

It could be easily (well, for someone who is familiar with the code) extended to be a "left join" mimicking KDB. I realize in this case that forward-filling the trade data is not appropriate; just illustrating the function.

pandas 0.19 has introduced an asof join:

pd.merge_asof(trades, quotes, on='time')

The semantics are very similar to the functionality in q/kdb+.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!