发表新帖

发表新帖

Optimizing my mysql statement! - RAND() TOO SLOW

后端未结

关注

 6  760

孤独总比滥情好 2020-12-11 03:26

So I have a table with over 80,000 records, this one is called system. I also have another table called follows.

I need my statement to randomly select records from

6条回答

情书的邮戳 (楼主)

2020-12-11 03:58

The reason the query is slow is that the database needs to keep a representation of all the generated random values and their respective data before it can return even a single row from the database. What you can do is to limit the number of candidate rows to consider first by using WHERE RAND() < x, where you select x to be a number likely to return at least the number of samples you need. To get a true random sample you would then need to order by RAND again or do sampling on the returned dataset.

Using this approach allows the database to process the query in a streaming fashion without having to build a large intermediate representation of all data. The drawback is that you can never be 100% sure that you get the number of samples you need, so you might have to perform the query again until you do, live with a smaller sample set or incrementally add samples (making sure to avoid duplicates) until you have the number of samples you need.

If you don't require the query to return different results for each call you could also add a pre-generated random value column with an index and combine with the above technique. It would allow you to get any number of samples in a fair manner, even if you add or delete rows, but the same query on the same data would of course return the same result set.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题