SQL random sample with groups

前端未结

关注

 4  2055

天涯浪人 2021-02-05 18:59

I have a university graduate database and would like to extract a random sample of data of around 1000 records.

I want to ensure the sample is representative of the pop

4条回答

旧时难觅i (楼主)

2021-02-05 19:35

You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:

select d.*
from (select d.*,
             row_number() over (order by coursecode, newid) as seqnum,
             count(*) over () as cnt
      from degree d
     ) d
where seqnum % (cnt / 500) = 1;

EDIT:

You can also calculate the population size for each group "on the fly":

select d.*
from (select d.*,
             row_number() over (partition by coursecode order by newid) as seqnum,
             count(*) over () as cnt,
             count(*) over (partition by coursecode) as cc_cnt
      from degree d
     ) d
where seqnum < 500 * (cc_cnt * 1.0 / cnt)

0 讨论(0)

查看其它4个回答