I have a university graduate database and would like to extract a random sample of data of around 1000 records.
I want to ensure the sample is representative of the pop
You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:
select d.*
from (select d.*,
row_number() over (order by coursecode, newid) as seqnum,
count(*) over () as cnt
from degree d
) d
where seqnum % (cnt / 500) = 1;
EDIT:
You can also calculate the population size for each group "on the fly":
select d.*
from (select d.*,
row_number() over (partition by coursecode order by newid) as seqnum,
count(*) over () as cnt,
count(*) over (partition by coursecode) as cc_cnt
from degree d
) d
where seqnum < 500 * (cc_cnt * 1.0 / cnt)