MSSQL Select Random in Large Data

假装没事ソ 提交于 2020-01-25 16:21:28

问题


I have a table that has more than 1 million records and I want to select random rows from this table, but not among all records - only select Random rows from results matching certain conditions.

Performance is very important, so I can NOT use ordering by NEWID and then select first item.

The table structure is some thing like this:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

Now, I wrote a query like:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

I tried to select 10 item randomly, and then select one of them, but I have an amazing problem!!!

Sometimes the output is ordered by item level! And I don't want it (it's not really random ). I really don't know how result was ordered by level.

Please suggest some solution that help me to select random record in high performance and random selected in high range of iteration is not duplicate.


回答1:


Based from MSDN's Selecting Rows Randomly from a Large Table, instead of the one you avoid:

select top 10 * from TableName order by newid()

It suggests this:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

It has only much smaller logical read an much better performance.




回答2:


Try something like this. It will randomly grab 10 rows from your table.

This is pseudo code, so you might need to fix a few column names to match your real tables.

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

Here is how it works.

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.


来源:https://stackoverflow.com/questions/26596045/mssql-select-random-in-large-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!