how to generate pseudo-random positive definite matrix with constraints on the off-diagonal elements?

后端 未结 4 1420
误落风尘
误落风尘 2021-01-13 12:24

The user wants to impose a unique, non-trivial, upper/lower bound on the correlation between every pair of variable in a var/covar matrix.

For example: I want a vari

4条回答
  •  梦谈多话
    2021-01-13 12:37

    Perhaps this answer will help operationalize it:

    One class of matrixes that has this property of non-negative definiteness is the Wishart Distribution. And samples from ~W() such that all non-diagonal entries are between some bounds [l,u] would fit your question. However, I don't believe this is the same as the distribution of all positive-definite matrices with non-diagonals in [l,u].

    On the wikipedia page there is an algorithm for calculating from ~W().

    A simpler, hackish solution (possibly approximating this) is to:

    (given that u>l and l>0)

    1. draw from a multivariate normal where Sigma = mean(l,u).
    2. Then taking the sample, calculating its correlation matrix => C
    3. This matrix will have some randomness (fuzz), but the math of how much fuzz it will have is a little out of my my league to calculate. The values of the off-diags in this C matrix are bounded by [-1,1], with mean of mean(l,u). By eyeball, I'm guessing some sort of beta/exponential. In any case, that continuous distribution of the off diags in the C guarantees it won't behave and lie inside the bounds (l,u), unless (l,u) = [-1,1].
    4. You can adjust the amount of "fuzz" by increasing / decreasing the length of the sample in step 1. I'd wager (unproven) that the amount of variance in C's odd-diags is proportional to the square-root of the number of samples.

    So this seems non-trivial to truly answer!

    As other posters have suggested, you can generate from Wishart, then keep the ones where the property you want is true, but you might be sampling for a long time! If you exclude those who are 0-definite (is that a word?) then this should work fine for generating good matrices. However this is not the true distribution of all pos-def matrices whose off-diags are in [l,u].

    Code (in R) for dumb-sampling scheme proposed above

    sigma1 <- function(n,sigma) {
        out <- matrix(sigma,n,n)
        diag(out) <- 1
        return (out)
    }
    
    library(mvtnorm)
    sample_around_sigma <- function(size, upper,lower, tight=500) {
        #  size:  size of matrix
        #  upper, lower:  bounds on the corr, should be > 0
        #  tight:  number of samples to use.  ideally this
        #     would be calcuated such that the odd-diags will
        #     be "pretty likely" to fall in [lower,upper]
        sigma <- sigma1(size,mean(c(upper,lower)))
        means <- 0*1:size
        samples <- rmvnorm(n=tight, mean=means,sigma=sigma)
        return (cor(samples))
    }
    
    > A <- sample_around_sigma(5, .3,.5)
    > A
              [,1]      [,2]      [,3]      [,4]      [,5]
    [1,] 1.0000000 0.3806354 0.3878336 0.3926565 0.4080125
    [2,] 0.3806354 1.0000000 0.4028188 0.4366342 0.3801593
    [3,] 0.3878336 0.4028188 1.0000000 0.4085453 0.3814716
    [4,] 0.3926565 0.4366342 0.4085453 1.0000000 0.3677547
    [5,] 0.4080125 0.3801593 0.3814716 0.3677547 1.0000000
    > 
    > summary(A[lower.tri(A)]); var(A[lower.tri(A)])
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     0.3678  0.3808  0.3902  0.3947  0.4067  0.4366 
    [1] 0.0003949876
    

提交回复
热议问题