问题
I would like to generate a random correlation matrix in R of 1000*1000 where the average correlation (excluding diagonal) is 0.3.
I looked at genPositiveDefMat
from library clusterGeneration but I couldn't figure out how to specify a given correlation.
回答1:
A boring example of such a matrix would be
C = (1-m)*I + m*U*U'
where I is the identity matrix, U a vector of all ones and m = 0.3. C is positive definite and the average (indeed every) off-diagonal element is m.
So we could try generating a matrix of the form
C = D + alpha*U*U'
where D is diagonal and positive definite, alpha a positive scaler and U a 'random' vector. Such a matrix will be positive definite. For this to have the correct average of the off diagonal elements a little algebra shows
alpha = dim*(dim-1)*m / (S*S-T)
where
S = Sum{ i | U[i] }
T = Sum{ i | U[i]*U[i]}
As long as all the elements of U are positive, we will have
S*S>T
and so alpha will be positive.
For the diagonal elements of C to be 1.0, we require
D[i] = 1 - alpha*U[i]*U[i] (i=1..dim)
and all of these must be non negative.
Alas I have been unable to find theoretically how the elements of U should be chosen to guarantee this. However experimentally, if the elements of U are uniform random numbers between 1.0 and 5.0, I've not seen a case where any of the D[i] are negative.
The upper limit for the elements of U, 5.0 above, controls how different the various correlations are. With 5.0 they vary between around 0.03 and 0.8, whild with an upper limit of 2.0 they vary between around 0.13 and 0.53. Choosing the upper limit too high will increase the likelihood of the method failing (D not positive).
来源:https://stackoverflow.com/questions/36357159/generating-random-correlation-matrix-with-given-average-correlation