Simulating correlated Bernoulli data

删除回忆录丶 提交于 2020-01-15 06:27:28

问题


I want to simulate 100 data with 5 columns. I want to get a correlation of 0.5 between the columns. To complete it, I have done the following action

F1 <- matrix( c(1, .5, .5, .5,.5,
                   .5, 1, .5, .5,.5,
                   .5, .5, 1, .5,.5,
                   .5, .5, .5, 1,.5,
                   .5, .5, .5, .5,1
), 5,5)

To simulate the intended data frame, I have done this, but it does not work properly.

 df2 <- as.data.frame (rbinom(100, 1,.5),ncol(5), F1)

回答1:


I'm surprised this isn't a duplicate (this question refers specifically to non-binary responses, i.e. binomial with N>1). The bindata package does what you want.

library(bindata)
## set up correlation matrix (compound-symmetric with rho=0.5)
m <- matrix(0.5,5,5)
diag(m) <- 1

Simulate with a mean of 0.5 (as in your example):

set.seed(101)
## this simulates 10 rather than 100 realizations
## (I didn't read your question carefully enough)
## but it's easy to change
r <- rmvbin(n=10, margprob=rep(0.5,5), bincorr=m)
round(cor(r,2))

Results

 1.00 0.22  0.80  0.05 0.22
 0.22 1.00  0.00  0.65 1.00
 0.80 0.00  1.00 -0.09 0.00
 0.05 0.65 -0.09  1.00 0.65
 0.22 1.00  0.00  0.65 1.00
  • this looks wrong - the correlations aren't exactly 0.5 - but on average they will be (when I sampled 10,000 vectors rather than 10, the values ranged from about 0.48 to 0.51). Equivalently, if you simulated many samples of 10 and computed the correlation matrix for each, you should find that the expected (average) correlation matrix is correct.
  • simulating values with correlation exactly equal to the specified value is much harder (and not necessarily what you want to do anyway, depending on the application)
  • note that there will be limitations about what mean vectors and correlation matrices are feasible. For example, the off-diagonal elements of an n-by-n compound-symmetric (equal-correlation) matrix can't be less than -1/(n-1). Similarly, there may be limits on what correlations are possible for a given set of means (this may be discussed in the technical reference, I haven't checked).

The reference for this method is

Leisch, Friedrich and Weingessel, Andreas and Hornik, Kurt (1998) On the generation of correlated artificial binary data. Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science", 13. SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. https://epub.wu.ac.at/286/



来源:https://stackoverflow.com/questions/59595292/simulating-correlated-bernoulli-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!