MySQL: Select Random Entry, but Weight Towards Certain Entries

这一生的挚爱 提交于 2019-12-17 02:11:32

问题


I've got a MySQL table with a bunch of entries in it, and a column called "Multiplier." The default (and most common) value for this column is 0, but it could be any number.

What I need to do is select a single entry from that table at random. However, the rows are weighted according to the number in the "Multiplier" column. A value of 0 means that it's not weighted at all. A value of 1 means that it's weighted twice as much, as if the entry were in the table twice. A value of 2 means that it's weighted three times as much, as if the entry were in the table three times.

I'm trying to modify what my developers have already given me, so sorry if the setup doesn't make a whole lot of sense. I could probably change it but want to keep as much of the existing table setup as possible.

I've been trying to figure out how to do this with SELECT and RAND(), but don't know how to do the weighting. Is it possible?


回答1:


This guy asks the same question. He says the same as Frank, but the weightings don't come out right and in the comments someone suggests using ORDER BY -LOG(1.0 - RAND()) / Multiplier, which in my testing gave pretty much perfect results.

(If any mathematicians out there want to explain why this is correct, please enlighten me! But it works.)

The disadvantage would be that you couldn't set the weighting to 0 to temporarily disable an option, as you would end up dividing by zero. But you could always filter it out with a WHERE Multiplier > 0.




回答2:


For a much better performance (specially on big tables), first index the weight column and use this query:

SELECT * FROM tbl WHERE id IN 
    (SELECT id FROM (SELECT id FROM tbl ORDER BY -LOG(1-RAND())/weight LIMIT x) t)

Two subqueries are used because MySQL doesn't support LIMIT in the first subquery yet.

On 40MB table the usual query takes 1s on my i7 machine and this one takes 0.04s.




回答3:


Don't use 0, 1 and 2 but 1, 2 and 3. Then you can use this value as a multiplier:

SELECT * FROM tablename ORDER BY (RAND() * Multiplier);



回答4:


Well, I would put the logic of weights in PHP:

<?php
    $weight_array = array(0, 1, 1, 2, 2, 2);
    $multiplier = $weight_array[array_rand($weight_array)];
?>

and the query:

SELECT *
FROM `table`
WHERE Multiplier = $multiplier
ORDER BY RAND()
LIMIT 1

I think it will work :)




回答5:


For others Googling this subject, I believe you can also do something like this:

SELECT strategy_id
FROM weighted_strategies AS t1 
WHERE (
   SELECT SUM(weight) 
   FROM weighted_strategies AS t2 
   WHERE t2.strategy_id<=t1.strategy_id
)>@RAND AND 
weight>0
LIMIT 1

The total sum of weights for all records must be n-1, and @RAND should be a random value between 0 and n-1 inclusive.

@RAND could be set in SQL or inserted as a integer value from the calling code.

The subselect will sum up all the preceeding records' weights, checking it it exceeds the random value supplied.




回答6:


<?php
/**
 * Demonstration of weighted random selection of MySQL database.
 */
$conn = mysql_connect('localhost', 'root', '');

// prepare table and data.
mysql_select_db('test', $conn);
mysql_query("drop table if exists temp_wrs", $conn);
mysql_query("create table temp_wrs (
    id int not null auto_increment,
    val varchar(16),
    weight tinyint,
    upto smallint,
    primary key (id)
)", $conn);
$base_data = array(    // value-weight pair array.
    'A' => 5,
    'B' => 3,
    'C' => 2,
    'D' => 7,
    'E' => 6,
    'F' => 3,
    'G' => 5,
    'H' => 4
);
foreach($base_data as $val => $weight) {
    mysql_query("insert into temp_wrs (val, weight) values ('".$val."', ".$weight.")", $conn);
}

// calculate the sum of weight.
$rs = mysql_query('select sum(weight) as s from temp_wrs', $conn);
$row = mysql_fetch_assoc($rs);
$sum = $row['s'];
mysql_free_result($rs);

// update range based on their weight.
// each "upto" columns will set by sub-sum of weight.
mysql_query("update temp_wrs a, (
    select id, (select sum(weight) from temp_wrs where id <= i.id) as subsum from temp_wrs i 
) b
set a.upto = b.subsum
where a.id = b.id", $conn);

$result = array();
foreach($base_data as $val => $weight) {
    $result[$val] = 0;
}
// do weighted random select ($sum * $times) times.
$times = 100;
$loop_count = $sum * $times;
for($i = 0; $i < $loop_count; $i++) {
    $rand = rand(0, $sum-1);
    // select the row which $rand pointing.
    $rs = mysql_query('select * from temp_wrs where upto > '.$rand.' order by id limit 1', $conn);
    $row = mysql_fetch_assoc($rs);
    $result[$row['val']] += 1;
    mysql_free_result($rs);
}

// clean up.
mysql_query("drop table if exists temp_wrs");
mysql_close($conn);
?>
<table>
    <thead>
        <th>DATA</th>
        <th>WEIGHT</th>
        <th>ACTUALLY SELECTED<br />BY <?php echo $loop_count; ?> TIMES</th>
    </thead>
    <tbody>
    <?php foreach($base_data as $val => $weight) : ?>
        <tr>
            <th><?php echo $val; ?></th>
            <td><?php echo $weight; ?></td>
            <td><?php echo $result[$val]; ?></td>
        </tr>
    <?php endforeach; ?>
    <tbody>
</table>

if you want to select N rows...

  1. re-calculate the sum.
  2. reset range ("upto" column).
  3. select the row which $rand pointing.

previously selected rows should be excluded on each selection loop. where ... id not in (3, 5);




回答7:


SELECT * FROM tablename ORDER BY -LOG(RAND()) / Multiplier;

Is the one which gives you the correct distribution.

SELECT * FROM tablename ORDER BY (RAND() * Multiplier);

Gives you the wrong distribution.

For example, there are two entries A and B in the table. A is with weight 100 while B is with weight 200. For the first one (exponential random variable), it gives you Pr(A winning) = 1/3 while the second one gives you 1/4, which is not correct. I wish I can show you the math. However I do not have enough rep to post relevant link.




回答8:


Whatever you do, it is giong to be terrible because it will involve: * Getting the total "weights" for all columns as ONE number (including applying the multiplier). * Getting a random number between 0 and that total. * Getting all entries and runing them along, deducting the weight from the random number and choosing the one entry when you run out of items.

In average you will run along half the table. Performance - unless the table is small, then do it outside mySQL in memory - will be SLOW.




回答9:


The result of the pseudo-code (rand(1, num) % rand(1, num)) will get more toward 0 and less toward num. Subtract the result from num to get the opposite.

So if my application language is PHP, it should look something like this:

$arr = mysql_fetch_array(mysql_query(
    'SELECT MAX(`Multiplier`) AS `max_mul` FROM tbl'
));
$MaxMul = $arr['max_mul']; // Holds the maximum value of the Multiplier column

$mul = $MaxMul - ( rand(1, $MaxMul) % rand(1, $MaxMul) );

mysql_query("SELECT * FROM tbl WHERE Multiplier=$mul ORDER BY RAND() LIMIT 1");

Explanation of the code above:

  1. Fetch the highest value in the Multiplier column
  2. calculate a random Multiplier value (weighted toward the maximum value in the Multiplier column)
  3. Fetch a random row which has that Multiplier value

It's also achievable merely by using MySQL.

Proving that the pseudo-code (rand(1, num) % rand(1, num)) will weight toward 0: Execute the following PHP code to see why (in this example, 16 is the highest number):

$v = array();

for($i=1; $i<=16; ++$i)
    for($k=1; $k<=16; ++$k)
        isset($v[$i % $k]) ? ++$v[$i % $k] : ($v[$i % $k] = 1);

foreach($v as $num => $times)
        echo '<div style="margin-left:', $times  ,'px">
              times: ',$times,' @ num = ', $num ,'</div>';



回答10:


While I realise this is an question on MySQL, the following may be useful for someone using SQLite3 which has subtly different implementations of RANDOM and LOG.

SELECT * FROM table ORDER BY (-LOG(abs(RANDOM() % 10000))/weight) LIMIT 1;

weight is a column in table containing integers (I've used 1-100 as the range in my table).

RANDOM() in SQLite produces numbers between -9.2E18 and +9.2E18 (see SQLite docs for more info). I used the modulo operator to get the range of numbers down a bit.

abs() will remove the negatives to avoid problems with LOG which only handles non-zero positive numbers.

LOG() is not actually present in a default install of SQLite3. I used the php SQLite3 CreateFunction call to use the php function in SQL. See the PHP docs for info on this.



来源:https://stackoverflow.com/questions/2417621/mysql-select-random-entry-but-weight-towards-certain-entries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!