Dask read_sql_table errors out when using an SQLAlchemy expression

前端未结

关注

 3  933

没有蜡笔的小新 2021-01-23 12:55

I\'m trying to use an SQLAlchemy expression with dask\'s read_sql_table in order to bring down a dataset that is created by joining and filtering a few different tables. The do

3条回答

没有蜡笔的小新 (楼主)

2021-01-23 13:02
For any others that run across this question. read_sql_table does not seem to support this use case (at this time). If you pass in an SQLAlchemy Select object, it ends up getting wrapped in another SQLAlchemy Select and without an alias, which is bad SQL (at least for PostgreSQL).

Looking at read_sql_table from the dask source, table is the Select object that is passed to read_sql_table and as seen, it gets wrapped in another select.
```
q = sql.select(columns).where(sql.and_(index >= lower, cond)
                              ).select_from(table)
```
The good news is the read_sql_table function is relatively straight forward and the magic is really only a couple lines that create a dataframe from a delayed objects. You just need to write your own logic to beak the query into chunks
```
parts = []
for query_chunk in queries:
    parts.append(delayed(_read_sql_chunk)(q, uri, meta, **kwargs))

return from_delayed(parts, meta, divisions=divisions)


def _read_sql_chunk(q, uri, meta, **kwargs):
    df = pd.read_sql(q, uri, **kwargs)
    if df.empty:
        return meta
    else:
        return df.astype(meta.dtypes.to_dict(), copy=False)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...