问题
This question is an extension of the excellent answer provided by Robert Picard here: How to Randomly Assign to Groups of Different Sizes
We have this dataset, which is the same as in the previous question, but adds the year
variable:
sysuse census, clear
keep state region pop
order state pop region
decode region, gen(reg)
replace reg="NCntrl" if reg=="N Cntrl"
drop region
gen year=20
replace year=30 if _n>15
replace year=40 if _n>35
If I just wanted to re-randomly assign reg
's across all observations (without regard to group), I could implement the answer to the previous post:
tempfile orig
save `orig'
keep reg
rename reg reg_new
set seed 234
gen double u = runiform()
sort u reg_new
merge 1:1 _n using `orig', nogen
How would the code be modified so that reg
is shuffled, but only within year
? For example, there are 15 observations where year==20
. These observations should be shuffled separately than the other years.
回答1:
Shuffling one variable doesn't require any file choreography. This can probably be shortened:
sysuse auto, clear
set seed 2803
gen double shuffle = runiform()
* example 1
sort shuffle
gen long which = _n
sort mpg
gen mpg_new = mpg[which]
list which mpg*
* example 2
bysort foreign (shuffle) : gen long which2 = _n
bysort foreign (mpg) : gen mpg2 = mpg[which2]
list which2 mpg mpg2, sepby(foreign)
All that said, I think sample
does this so long as you specify the same sample size as the number in the dataset. It's overkill because you get all the variables.
来源:https://stackoverflow.com/questions/48887536/shuffle-one-variable-within-group