pandas DataFrame drops index when passing to kdb+ (using qPython API)

别说谁变了你拦得住时间么 提交于 2020-01-15 12:34:07

问题


I am trying to pass time-series data from Python to q/kdb+.

One solution out there is qPython module, offering seamless conversion from q table/dictionary to Pandas.

The problem is when trying to pass from Pandas to q, the time index in DataFrame (in the column Date) doesn't quite make it to the q side. Reproducible code:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#    Open  High  Low  Close    Volume  Adj Close
# 0 10.17 10.28 10.05 10.28  60855800       9.43 
# 1 10.45 11.24 10.40 10.96 215620200      10.05
# 2 11.21 11.46 11.13 11.37 200070600      10.43
# 3 11.46 11.69 11.32 11.66 130201700      10.69
# 4 11.67 11.74 11.46 11.69 130463000      10.72

As you can see, the q table doesn't have the Date column that was present in f DataFrame as index.

How to efficiently (for large data) pass the datetime index to q?


回答1:


While serializing DataFrame objects the qPython checks for the presence of meta attribute. If the attribute is not present, DataFrame is serialized as q table and index columns are skipped in the process. If you want to preserve the index columns, you have to set the meta attribute and provide type hinting to enforce representation a q keyed table.

Please take a look at the modified sample:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

from qpython import MetaData
from qpython.qtype import QKEYED_TABLE


start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#              Open   High    Low  Close     Volume  Adj Close
# Date                                                         
# 2010-01-04  10.17  10.28  10.05  10.28   60855800       9.43
# 2010-01-05  10.45  11.24  10.40  10.96  215620200      10.05
# 2010-01-06  11.21  11.46  11.13  11.37  200070600      10.43
# 2010-01-07  11.46  11.69  11.32  11.66  130201700      10.69
# 2010-01-08  11.67  11.74  11.46  11.69  130463000      10.72



回答2:


Have you considered trying one of the other APIs? http://www.timestored.com/kdb-guides/python-api I would also report the issue on their github page.



来源:https://stackoverflow.com/questions/28385137/pandas-dataframe-drops-index-when-passing-to-kdb-using-qpython-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!