How can I best write a query that selects 10 rows randomly from a total of 600k?
I used this http://jan.kneschke.de/projects/mysql/order-by-rand/ posted by Riedsio (i used the case of a stored procedure that returns one or more random values):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
INSERT INTO rands
SELECT r1.id
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article he solves the problem of gaps in ids causing not so random results by maintaining a table (using triggers, etc...see the article); I'm solving the problem by adding another column to the table, populated with contiguous numbers, starting from 1 (edit: this column is added to the temporary table created by the subquery at runtime, doesn't affect your permanent table):
DROP TEMPORARY TABLE IF EXISTS rands;
CREATE TEMPORARY TABLE rands ( rand_id INT );
loop_me: LOOP
IF cnt < 1 THEN
LEAVE loop_me;
END IF;
SET @no_gaps_id := 0;
INSERT INTO rands
SELECT r1.id
FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
(SELECT (RAND() *
(SELECT COUNT(*)
FROM random)) AS id)
AS r2
WHERE r1.no_gaps_id >= r2.id
ORDER BY r1.no_gaps_id ASC
LIMIT 1;
SET cnt = cnt - 1;
END LOOP loop_me;
In the article i can see he went to great lengths to optimize the code; i have no ideea if/how much my changes impact the performance but works very well for me.