Create correlated variables following various distributions

一曲冷凌霜 提交于 2019-12-14 03:48:39

问题


Question

In R, I would like to create n variables of length L which relationship is given by a correlation matrix called cor_matrix. The important point is that the n variables may follow different distributions (including continuous vs discrete distributions).

Related posts

  1. how-to-generate-sample-data-with-exact-moments

  2. generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable

  3. r-constructing-correlated-variables

Modified from the third post listed above, the following is a solution whenever all n variables are continuous and come from the same distribution.

library(psych) 

set.seed(199)

fun = function(cor_matrix, list_distributions, L)
{
    n = length(list_distributions)
    if (ncol(cor_matrix) != nrow(cor_matrix)) stop("cor_matrix is not square")
    if (nrow(cor_matrix) != n) stop("the length of list_distributions should match the number of columns and rows of cor_matrix")
    if (L<=1) stop("L should be > 1")

    fit = principal(cor_matrix, nfactors=n, rotate="none")
    loadings = matrix(fit$loadings[1:n, 1:n], nrow=n,ncol=n,byrow=F)
    cases = t(sapply(1:n, FUN=function(i, L) list_distributions[[i]](L), L=L))
    multivar = loadings %*% cases
    T_multivar = t(multivar)
    vars=as.data.frame(T_multivar)
    return(vars)
}

L = 1000
cor_matrix =  matrix(c (1.00, 0.90, 0.20 ,
                     0.90, 1.00, 0.40 ,
                     0.20, 0.40, 1.00), 
                  nrow=3,ncol=3,byrow=TRUE)

list_distributions = list(function(L)rnorm(L,0,2), function(L)rnorm(L,10,10), function(L) rnorm(L,0,1))
vars = fun(cor_matrix, list_distributions, L)
cor(vars)
plot(vars)

However, one cannot create correlated variables with the following distributions

list_distributions = list(function(L)rnorm(L,0,2), function(L)round(rnorm(L,10,10)), function(L) runif(L,0,1))
vars = fun(cor_matrix, list_distributions, L)
cor(vars)
plot(vars)


回答1:


Using copulas as suggested by @NatePope and @JoshO'Brien

library(mvtnorm)

set.seed(199)

fun = function(cor_matrix, list_distributions, L)
{
    n = length(list_distributions)
    # Correlated Gaussian variables
    Gauss = rmvnorm(n=L, mean = rep(0,n), sig=cor_matrix)
    # convert them to uniform distribution.
    Unif = pnorm(Gauss) 
    # Convert them to whatever I want
    vars = sapply(1:n, FUN = function(i) list_distributions[[i]](Unif[,i]))
    return(vars)
}

L = 2000
cor_matrix =  matrix(c (1.00, 0.90, 0.80 ,
                     0.90, 1.00, 0.6,
                     0.80, 0.6, 1.00), 
                  nrow=3,ncol=3,byrow=TRUE)

list_distributions = list(function(L) qpois(L,7), function(L) round(qnorm(L,100,10)), function(L) qnorm(L,-100,1))

vars = fun(cor_matrix, list_distributions, L)
cor(vars)
plot(as.data.frame(vars))

This solution has the default of creating correlated normally distributed variables to transform them to uniformly distributed variables afterward. There is probably a more performant solution that would directly create uniformly distributed correlated variables.



来源:https://stackoverflow.com/questions/32365016/create-correlated-variables-following-various-distributions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!