SQL random sample with groups

前端 未结 4 2035
天涯浪人
天涯浪人 2021-02-05 18:59

I have a university graduate database and would like to extract a random sample of data of around 1000 records.

I want to ensure the sample is representative of the pop

4条回答
  •  旧时难觅i
    2021-02-05 19:35

    You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:

    select d.*
    from (select d.*,
                 row_number() over (order by coursecode, newid) as seqnum,
                 count(*) over () as cnt
          from degree d
         ) d
    where seqnum % (cnt / 500) = 1;
    

    EDIT:

    You can also calculate the population size for each group "on the fly":

    select d.*
    from (select d.*,
                 row_number() over (partition by coursecode order by newid) as seqnum,
                 count(*) over () as cnt,
                 count(*) over (partition by coursecode) as cc_cnt
          from degree d
         ) d
    where seqnum < 500 * (cc_cnt * 1.0 / cnt)
    

提交回复
热议问题