statistical-sampling

simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas

喜你入骨 提交于 2019-12-12 04:15:37
问题 I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows. Below code wont work as ranuni is not supported by oracle proc sql outobs =1000000; select * from connection to oracle( select * from tbl1 order by ranuni

Random sampling without replacement in longitudinal data

落爺英雄遲暮 提交于 2019-12-07 21:03:16
问题 My data is longitudinal. VISIT ID VAR1 1 001 ... 1 002 ... 1 003 ... 1 004 ... ... 2 001 ... 2 002 ... 2 003 ... 2 004 ... Our end goal is picking out 10% each visit to run a test. I tried to use proc SURVEYSELECT to do SRS without replacement and using "VISIT" as strata. But the final sample would have duplicated IDs. For example, ID=001 might be selected both in VISIT=1 and VISIT=2. Is there any way to do that using SURVEYSELECT or other procedure (R is also fine)? Thanks a lot. 回答1: This

Random sampling without replacement in longitudinal data

久未见 提交于 2019-12-06 13:21:43
My data is longitudinal. VISIT ID VAR1 1 001 ... 1 002 ... 1 003 ... 1 004 ... ... 2 001 ... 2 002 ... 2 003 ... 2 004 ... Our end goal is picking out 10% each visit to run a test. I tried to use proc SURVEYSELECT to do SRS without replacement and using "VISIT" as strata. But the final sample would have duplicated IDs. For example, ID=001 might be selected both in VISIT=1 and VISIT=2. Is there any way to do that using SURVEYSELECT or other procedure (R is also fine)? Thanks a lot. This is possible with some fairly creative data step programming. The code below uses a greedy approach, sampling