Dask read_sql_table errors out when using an SQLAlchemy expression

为君一笑 提交于 2019-12-02 07:38:50

The query sent on that line is auto-generated by SQLAlchemy, so the syntax ought to be correct. However, I notice that your original query includes a .limit() modifier. The purpose of the line head = is to get the first few rows, to infer types. If the original query already has a limit clause, I can see that the two might conflict. Please try using a query without .limit().

For any others that run across this question. read_sql_table does not seem to support this use case (at this time). If you pass in an SQLAlchemy Select object, it ends up getting wrapped in another SQLAlchemy Select and without an alias, which is bad SQL (at least for PostgreSQL).

Looking at read_sql_table from the dask source, table is the Select object that is passed to read_sql_table and as seen, it gets wrapped in another select.

q = sql.select(columns).where(sql.and_(index >= lower, cond)
                              ).select_from(table)

The good news is the read_sql_table function is relatively straight forward and the magic is really only a couple lines that create a dataframe from a delayed objects. You just need to write your own logic to beak the query into chunks

parts = []
for query_chunk in queries:
    parts.append(delayed(_read_sql_chunk)(q, uri, meta, **kwargs))

return from_delayed(parts, meta, divisions=divisions)


def _read_sql_chunk(q, uri, meta, **kwargs):
    df = pd.read_sql(q, uri, **kwargs)
    if df.empty:
        return meta
    else:
        return df.astype(meta.dtypes.to_dict(), copy=False)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!