问题
I am trying to translate a fairly short bit of SQL into an sqlAlchemy ORM query. The SQL uses Postgres's generate_series to make a set of dates and my goal is to make a set of time series arrays categorized by one of the columns.
The tables (simplified) are very simple:
counts:
-----------------
count (Integer)
day (Date)
placeID (foreign key related to places)
"counts_pkey" PRIMARY KEY (day, placeID)
places:
-----------------
id
name (varchar)
The output I'm after is a time series of counts for each place including null values when counts are not reported for a day. For example, this would correspond to a series over four days:
array_agg | name
-----------------+-------------------
{NULL,0,7,NULL} | A Place
{NULL,1,NULL,2} | Some other place
{5,NULL,3,NULL} | Yet another
I can do this fairly easily by taking a CROSS JOIN on a date range and places and joining that with the counts:
SELECT array_agg(counts.count), places.name
FROM generate_series('2018-11-01', '2018-11-04', interval '1 days') as day
CROSS JOIN places
LEFT OUTER JOIN counts on counts.day = day.day AND counts.PlaceID = places.id
GROUP BY places.name;
What I can't seem to figure out is how to get SQLAlchemy to do this. After a lot of digging, I found an old google groups thread which almost works leading to this:
date_list = select([column('generate_series')])\
.select_from(func.generate_series(backthen, today, '1 day'))\
.alias('date_list')
time_series = db.session.query(Place.name, func.array_agg(Count.count))\
.select_from(date_list)\
.outerjoin(Count, (Count.day == date_list.c.generate_series) & (Count.placeID == Place.id ))\
.group_by(Place.name)
This creates a sub-select for the time series, but it produces a database error:
There is an entry for table "places", but it cannot be referenced from this part of the query.
So my question is: how would you do this in sqlalchemy. Also, I'm open to the idea that this is difficult because my approach with the SQL is bone-headed.
回答1:
The problem is that given the query construct SQLAlchemy produces a query along the lines of
SELECT ...
FROM places,
(...) AS date_list LEFT OUTER JOIN count ON ... AND count."placeID" = places.id
...
There are 2 FROM-list items: places and the join. Items cannot cross-reference each other1, and hence the error due to places.id in the ON-clause.
SQLAlchemy does not support explicit CROSS JOIN, but on the other hand a CROSS JOIN is equivalent to an INNER JOIN ON (TRUE). You could also omit wrapping the function expression in a subquery and use it as is by giving it an alias:
date_list = func.generate_series(backthen, today, '1 day').alias('gen_day')
time_series = session.query(Place.name, func.array_agg(Count.count))\
.join(date_list, true())\
.outerjoin(Count, (Count.day == column('gen_day')) &
(Count.placeID == Place.id ))\
.group_by(Place.name)
1: Except function-call FROM-items, or using LATERAL.
来源:https://stackoverflow.com/questions/53137019/using-function-output-in-sqlalchemy-join-clause