(This question is related to how to generate a dataset of correlated variables with different distributions?)
In Stata, say that I create a random variable following a Uniform[0,1] distribution:
set seed 100
gen random1 = runiform()
I now want to create a second random variable that is correlated with the first (the correlation should be .75, say), but is bounded by 0 and 1. I would like this second variable to also be more-or-less Uniform[0,1]. How can I do this?
This won't be exact, but the NORTA/copula method should be pretty close and easy to implement.
The relevant citation is:
Cario, Marne C., and Barry L. Nelson. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois, 1997.
The paper can be found here.
The general recipe to generate correlated random variables from any distribution is:
- Draw two (or more) correlated variables from a joint standard normal distribution using
corr2data
- Calculate the univariate normal CDF of each of these variables using
normal()
- Apply the inverse CDF of any distribution to simulate draws from that distribution.
The third step is pretty easy with the [0,1] uniform: you don't even need it. Typically, the magnitude of the correlations you get will be less than the magnitudes of the original (normal) correlations, so it might be useful to bump those up a bit.
Stata Code for 2 uniformish variables that have a correlation of 0.75:
clear
// Step 1
matrix C = (1, .75 \ .75, 1)
corr2data x y, n(10000) corr(C) double
corr x y, means
// Steps 2-3
replace x = normal(x)
replace y = normal(y)
// Make sure things worked
corr x y, means
stack x y, into(z) clear
lab define vars 1 "x" 2 "y"
lab val _stack vars
capture ssc install bihist
bihist z, by(_stack) density tw1(yline(-1 0 1))
If you want to improve the approximation for the uniform case, you can transform the correlations like this (see section 5 of the linked paper):
matrix C = (1,2*sin(.75*_pi/6)\2*sin(.75*_pi/6),1)
This is 0.76536686 instead of the 0.75.
Code for the question in the comments
The correlation matrix C written more compactly, and I am applying the transformation:
clear
matrix C = ( 1, ///
2*sin(-.46*_pi/6), 1, ///
2*sin(.53*_pi/6), 2*sin(-.80*_pi/6), 1, ///
2*sin(0*_pi/6), 2*sin(-.41*_pi/6), 2*sin(.48*_pi/6), 1 )
corr2data v1 v2 v3 v4, n(10000) corr(C) cstorage(lower)
forvalues i=1/4 {
replace v`i' = normal(v`i')
}
来源:https://stackoverflow.com/questions/32718752/how-to-generate-correlated-uniform0-1-variables