I am trying to do a simple filter operation on a query in sqlalchemy, like this:
q = session.query(Genotypes).filter(Genotypes.rsid.in_(inall))
If the table where you are getting your rsid
s from is available in the same database I'd use a subquery to pass them into your Genotypes
query rather than passing the one million entries around in your Python code.
sq = session.query(RSID_Source).subquery()
q = session.query(Genotypes).filter(Genotypes.rsid.in_(sq))
The issue is that in order to pass that list to SQLite (or any database, really), SQLAlchemy has to pass over each entry for your in
clause as a variable. The SQL translates roughly to:
-- Not valid SQLite SQL
DECLARE @Param1 TEXT;
SET @Param1 = ?;
DECLARE @Param2 TEXT;
SET @Param2 = ?;
-- snip 999,998 more
SELECT field1, field2, -- etc.
FROM Genotypes G
WHERE G.rsid IN (@Param1, @Param2, /* snip */)