How to select one row randomly taking into account a weight?

后端 未结 7 2002
一生所求
一生所求 2020-11-29 11:58

I have a table which looks like that:

id: primary key
content: varchar
weight: int

What I want to do is randomly select one row from this t

7条回答
  •  刺人心
    刺人心 (楼主)
    2020-11-29 12:46

    I have tried van's solution and, although it works, it is not quick.

    My Solution

    The way that I am solving this problem is by maintaining a separate, linked table for the weighting. The basic table structure is similar to this:

    CREATE TABLE `table1` (
      `id` int(11) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
      `name` varchar(100),
      `weight` tinyint(4) NOT NULL DEFAULT '1',
    );
    
    CREATE TABLE `table1_weight` (
      `id` bigint(20) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
      `table1_id` int(11) NOT NULL
    );
    

    If I have a record in table1 with a weight of 3, then I create 3 records in table1_weight, linked to table1 via the table1_id field. Whatever the value of weight is in table1, that's how many linked records I create in table1_weight.

    Testing

    On a dataset with 976 records in table1 with a total weight of 2031 and hence 2031 records in table1_weight, I ran the following two SQLs:

    1. A version of van's solution

      SELECT t.*
      FROM table1 t
      INNER JOIN
        ( SELECT t.id,
             SUM(tt.weight) AS cum_weight
         FROM table1 t
         INNER JOIN table1 tt ON tt.id <= t.id
         GROUP BY t.id) tc ON tc.id = t.id,
        ( SELECT SUM(weight) AS total_weight
         FROM table1) tt,
        ( SELECT RAND() AS rnd) r
      WHERE r.rnd * tt.total_weight <= tc.cum_weight
      ORDER BY t.id ASC
      LIMIT 1
      
    2. Joining to a secondary table for the weighting

    SELECT t.*
    FROM table1 t
    INNER JOIN table1_weight w
        ON w.table1_id = t.id
    ORDER BY RAND()
    LIMIT 1
    

    SQL 1 takes consistently 0.4 seconds.

    SQL 2 takes between 0.01 and 0.02 seconds.

    Conclusion

    If the speed of selection of a random, weighted record is not an issue, then the single table SQL suggested by van is fine and doesn't have the overhead of maintaining a separate table.

    If, as in my case, a short selection time is critical, then I would recommend the two table method.

提交回复
热议问题