How can i optimize MySQL's ORDER BY RAND() function?

前端 未结 8 2222
天命终不由人
天命终不由人 2020-11-22 00:58

I\'d like to optimize my queries so I look into mysql-slow.log.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solu

相关标签:
8条回答
  • 2020-11-22 01:08

    Try this:

    SELECT  *
    FROM    (
            SELECT  @cnt := COUNT(*) + 1,
                    @lim := 10
            FROM    t_random
            ) vars
    STRAIGHT_JOIN
            (
            SELECT  r.*,
                    @lim := @lim - 1
            FROM    t_random r
            WHERE   (@cnt := @cnt - 1)
                    AND RAND(20090301) < @lim / @cnt
            ) i
    

    This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

    The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

    See this article in my blog for more detail:

    • Selecting random rows

    Update:

    If you need to select but a single random record, try this:

    SELECT  aco.*
    FROM    (
            SELECT  minid + FLOOR((maxid - minid) * RAND()) AS randid
            FROM    (
                    SELECT  MAX(ac_id) AS maxid, MIN(ac_id) AS minid
                    FROM    accomodation
                    ) q
            ) q2
    JOIN    accomodation aco
    ON      aco.ac_id =
            COALESCE
            (
            (
            SELECT  accomodation.ac_id
            FROM    accomodation
            WHERE   ac_id > randid
                    AND ac_status != 'draft'
                    AND ac_images != 'b:0;'
                    AND NOT EXISTS
                    (
                    SELECT  NULL
                    FROM    accomodation_category
                    WHERE   acat_id = ac_category
                            AND acat_slug = 'vendeglatohely'
                    )
            ORDER BY
                    ac_id
            LIMIT   1
            ),
            (
            SELECT  accomodation.ac_id
            FROM    accomodation
            WHERE   ac_status != 'draft'
                    AND ac_images != 'b:0;'
                    AND NOT EXISTS
                    (
                    SELECT  NULL
                    FROM    accomodation_category
                    WHERE   acat_id = ac_category
                            AND acat_slug = 'vendeglatohely'
                    )
            ORDER BY
                    ac_id
            LIMIT   1
            )
            )
    

    This assumes your ac_id's are distributed more or less evenly.

    0 讨论(0)
  • 2020-11-22 01:13

    Here's how I'd do it:

    SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
      FROM    accomodation a
      JOIN    accomodation_category c
        ON (a.ac_category = c.acat_id)
      WHERE   a.ac_status != 'draft'
            AND c.acat_slug != 'vendeglatohely'
            AND a.ac_images != 'b:0;';
    
    SET @sql := CONCAT('
      SELECT  a.ac_id,
            a.ac_status,
            a.ac_name,
            a.ac_status,
            a.ac_images
      FROM    accomodation a
      JOIN    accomodation_category c
        ON (a.ac_category = c.acat_id)
      WHERE   a.ac_status != ''draft''
            AND c.acat_slug != ''vendeglatohely''
            AND a.ac_images != ''b:0;''
      LIMIT ', @r, ', 1');
    
    PREPARE stmt1 FROM @sql;
    
    EXECUTE stmt1;
    
    0 讨论(0)
  • 2020-11-22 01:15

    (Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)

    Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
    Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
    Case: AUTO_INCREMENT with gaps, 1 row returned
    Case: Extra FLOAT column for randomizing
    Case: UUID or MD5 column

    Those 5 cases can be made very efficient for large tables. See my blog for the details.

    0 讨论(0)
  • 2020-11-22 01:17

    I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.

    So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.

    SELECT  accomodation.ac_id,
            accomodation.ac_status,
            accomodation.ac_name,
            accomodation.ac_status,
            accomodation.ac_images
    FROM    accomodation, accomodation_category
    WHERE   accomodation.ac_status != 'draft'
            AND accomodation.ac_category = accomodation_category.acat_id
            AND accomodation_category.acat_slug != 'vendeglatohely'
            AND ac_images != 'b:0;'
            AND rand() <= $size * $factor / [accomodation_table_row_count]
    LIMIT $size
    

    $size * $factor / [accomodation_table_row_count] works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size

    The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:

    SELECT  accomodation.ac_id,
            accomodation.ac_status,
            accomodation.ac_name,
            accomodation.ac_status,
            accomodation.ac_images
    FROM    accomodation, accomodation_category
    WHERE   accomodation.ac_status != 'draft'
            AND accomodation.ac_category = accomodation_category.acat_id
            AND accomodation_category.acat_slug != 'vendeglatohely'
            AND ac_images != 'b:0;'
            AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
    LIMIT $size
    

    The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category)) But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.

    This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().

    Note: If the query logic permits, perform the random selection as early as possible before any join operations.

    0 讨论(0)
  • 2020-11-22 01:18
    function getRandomRow(){
        $id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
        $res = getRowById($id);
        if(!empty($res))
        return $res;
        return getRandomRow();
    }
    
    //rowid is a key on table
    function getRowById($rowid=false){
    
       return db select from table where rowid = $rowid; 
    }
    
    0 讨论(0)
  • 2020-11-22 01:20

    This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.

    SELECT  accomodation.ac_id,
            accomodation.ac_status,
            accomodation.ac_name,
            accomodation.ac_status,
            accomodation.ac_images
    FROM    accomodation, accomodation_category
    WHERE   accomodation.ac_status != 'draft'
            AND accomodation.ac_category = accomodation_category.acat_id
            AND accomodation_category.acat_slug != 'vendeglatohely'
            AND ac_images != 'b:0;'
    AND accomodation.ac_id IS IN (
            SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
    )
    
    0 讨论(0)
提交回复
热议问题