How to iterate through a Firebird Database without having to completely load it inside my program?

岁酱吖の 提交于 2020-01-14 06:41:17

问题


I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird database with almost 100M rows.

After stumbling a little bit, I finally managed to filter the database, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data, so it becomes useless when trying to realize the task with the required frequency.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):

    start_time = time.time()

    conn = pyodbc.connect(access)

    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Or, isolating the query:

SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'

Since I will only need a X number of rows from the bottom of the database (where this X number changes everyday), I know I could optimize my code by just reading part of the database, starting from the last rows, iterating through each one of the rows, without having to process the entire dataframe.

So my question is: how can it be done? And if it's not a good idea/approach whatelse could I do to solve this issue?


回答1:


I think chunksize is your way out, please check the documentation here:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.html

and also the examples posted here:

http://shichaoji.com/2016/10/11/python-iterators-loading-data-in-chunks/#Loading-data-in-chunks

Good luck!



来源:https://stackoverflow.com/questions/53750647/how-to-iterate-through-a-firebird-database-without-having-to-completely-load-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!