In SQL databases (I use Python+Sqlite), how to make sure that, if we have 1 million rows, the query
SELECT * FROM mytable WHERE myfunction(description) <
Inspired by @GordThompson's answer, here is a benchmark between:
(1) SELECT * FROM mytable WHERE col2 < 1000 AND myfunction(col1) < 500
vs.
(2) SELECT * FROM mytable WHERE myfunction(col1) < 500 AND col2 < 1000
import sqlite3, time, random
def myfunc(x):
time.sleep(0.001) # wait 1 millisecond for each call of this function
return x
# Create database
db = sqlite3.connect(':memory:')
db.create_function("myfunction", 1, myfunc)
c = db.cursor()
c.execute('CREATE TABLE mytable (col1 INTEGER, col2 INTEGER)');
for i in range(10*1000):
a = random.randint(0,1000)
c.execute('INSERT INTO mytable VALUES (?, ?)', (a, i));
# Do the evil query
t0 = time.time()
c.execute('SELECT * FROM mytable WHERE col2 < 1000 AND myfunction(col1) < 500')
for e in c.fetchall():
print e
print "Elapsed time: %.2f" % (time.time() - t0)
Result: 1.02 seconds, it means that myfunc
has been called max 1000 times, i.e. not for all the 10k rows.
Idem with:
c.execute('SELECT * FROM mytable WHERE myfunction(col1) < 500 AND col2 < 1000')
instead.
Result: 10.05 seconds, it means that myfunc
has been called ~ 10k times, i.e. for all the 10k rows, even those for which the condition col2 < 1000
is not True.
Global conclusion: Sqlite does lazy evaluation for AND
, i.e. the easy condition has to be written first like this:
... WHERE AND