Why is SQLAlchemy count() much slower than the raw query?

后端 未结 3 702
广开言路
广开言路 2020-12-12 17:11

I\'m using SQLAlchemy with a MySQL database and I\'d like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as w

相关标签:
3条回答
  • 2020-12-12 17:37

    Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id)):

    Return a count of rows this Query would return.

    This generates the SQL for this Query as follows:

    SELECT count(1) AS count_1 FROM (
         SELECT <rest of query follows...> ) AS anon_1
    

    For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:

    from sqlalchemy import func
    
    # count User records, without
    # using a subquery.
    session.query(func.count(User.id))
    
    # return count of user "id" grouped
    # by "name"
    session.query(func.count(User.id)).\
            group_by(User.name)
    
    from sqlalchemy import distinct
    
    # count distinct "name" values
    session.query(func.count(distinct(User.name)))
    
    0 讨论(0)
  • 2020-12-12 17:48

    It took me a long time to find this as the solution to my problem. I was getting the following error:

    sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): Incorrect key file for table '/tmp/#sql_40ab_0.MYI'; try to repair it

    The problem was resolved when I changed this:

    query = session.query(rumorClass).filter(rumorClass.exchangeDataState == state)
    return query.count()
    

    to this:

    query = session.query(func.count(rumorClass.id)).filter(rumorClass.exchangeDataState == state)
    return query.scalar()
    
    0 讨论(0)
  • 2020-12-12 17:56

    The reason is that SQLAlchemy's count() is counting the results of a subquery which is still doing the full amount of work to retrieve the rows you are counting. This behavior is agnostic of the underlying database; it isn't a problem with MySQL.

    The SQLAlchemy docs explain how to issue a count without a subquery by importing func from sqlalchemy.

    session.query(func.count(User.id)).scalar()
    
    >>>SELECT count(users.id) AS count_1 \nFROM users')
    
    0 讨论(0)
提交回复
热议问题