I\'m trying to use an SQLAlchemy expression with dask\'s read_sql_table in order to bring down a dataset that is created by joining and filtering a few different tables. The do
For any others that run across this question. read_sql_table does not seem to support this use case (at this time). If you pass in an SQLAlchemy Select object, it ends up getting wrapped in another SQLAlchemy Select and without an alias, which is bad SQL (at least for PostgreSQL).
Looking at read_sql_table from the dask source, table is the Select object that is passed to read_sql_table and as seen, it gets wrapped in another select.
q = sql.select(columns).where(sql.and_(index >= lower, cond)
).select_from(table)
The good news is the read_sql_table function is relatively straight forward and the magic is really only a couple lines that create a dataframe from a delayed objects. You just need to write your own logic to beak the query into chunks
parts = []
for query_chunk in queries:
parts.append(delayed(_read_sql_chunk)(q, uri, meta, **kwargs))
return from_delayed(parts, meta, divisions=divisions)
def _read_sql_chunk(q, uri, meta, **kwargs):
df = pd.read_sql(q, uri, **kwargs)
if df.empty:
return meta
else:
return df.astype(meta.dtypes.to_dict(), copy=False)