OFFSET vs. ROW_NUMBER()

后端 未结 2 1724
渐次进展
渐次进展 2020-12-13 02:11

As we know, Postgresql\'s OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination

相关标签:
2条回答
  • 2020-12-13 03:04

    I've constructed a test which compares OFFSET, cursors, and ROW_NUMBER(). My impression of ROW_NUMBER(), that it would be consistent in speed regardless of where you are in the result set, is correct. However, that speed is dramatically slower than either OFFSET or CURSOR, which, as was also my impression, are pretty much the same in speed, both degrading in speed the further out to the end of the result you go.

    Results:

    offset(100,100): 0.016359
    scroll(100,100): 0.018393
    rownum(100,100): 15.535614
    
    offset(100,480000): 1.761800
    scroll(100,480000): 1.781913
    rownum(100,480000): 15.158601
    
    offset(100,999900): 3.670898
    scroll(100,999900): 3.664517
    rownum(100,999900): 14.581068
    

    The test script uses sqlalchemy to set up tables and 1000000 rows of test data. It then uses a psycopg2 cursor to execute each SELECT statement and fetch results with the three different methods.

    from sqlalchemy import *
    
    metadata = MetaData()
    engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)
    
    t1 = Table('t1', metadata,
        Column('id', Integer, primary_key=True),
        Column('d1', String(50)),
        Column('d2', String(50)),
        Column('d3', String(50)),
        Column('d4', String(50)),
        Column('d5', String(50))
    )
    
    if not engine.has_table('t1'):
        conn = engine.connect()
        t1.create(conn)
    
        # 1000000 rows
        for i in range(100):
            conn.execute(t1.insert(), [
                dict(
                    ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
                    for col in range(1, 6)
                ) for j in xrange(1, 10001)
            ])
    
    import time
    
    def timeit(fn, count, *args):
        now = time.time()
        for i in xrange(count):
            fn(*args)
        total = time.time() - now
        print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)
    
    # this is a raw psycopg2 connection.
    conn = engine.raw_connection()
    
    def offset(limit, offset):
        cursor = conn.cursor()
        cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
        cursor.fetchall()
        cursor.close()
    
    def rownum(limit, offset):
        cursor = conn.cursor()
        cursor.execute("select * from (select *, "
                        "row_number() over (order by id asc) as rownum from t1) as foo "
                        "where rownum>=%d and rownum<%d" % (offset, limit + offset))
        cursor.fetchall()
        cursor.close()
    
    def scroll(limit, offset):
        cursor = conn.cursor('foo')
        cursor.execute("select * from t1 order by id")
        cursor.scroll(offset)
        cursor.fetchmany(limit)
        cursor.close()
    
    print 
    
    timeit(offset, 10, 100, 100)
    timeit(scroll, 10, 100, 100)
    timeit(rownum, 10, 100, 100)
    
    print 
    
    timeit(offset, 10, 100, 480000)
    timeit(scroll, 10, 100, 480000)
    timeit(rownum, 10, 100, 480000)
    
    print 
    
    timeit(offset, 10, 100, 999900)
    timeit(scroll, 10, 100, 999900)
    timeit(rownum, 10, 100, 999900)
    
    0 讨论(0)
  • 2020-12-13 03:06

    Use a CURSOR for a large resultset, will be much faster. For small result sets the LIMIT OFFSET construction works fine, but it has it's limits.

    ROW_NUMBER is a nice thing, but not for pagination. You end up with bad performance because of sequential scan's.

    0 讨论(0)
提交回复
热议问题