How to calculate multivariate normal distribution function in R

偶尔善良 提交于 2019-12-10 21:07:53

问题


Here's what I tried, making use of the mvtnorm package

Sample Dataset

library(mvtnorm)

set.seed(2357)
df <- data.frame(
  x = rnorm(1000, mean=80, sd=20),
  y = rnorm(1000, mean=0, sd=5),
  z = rnorm(1000, mean=0, sd=5)
)

head(df)
      x      y       z
1 70.38  1.307  0.2005
2 59.76  5.781 -3.5095
3 54.14 -1.313 -1.9022
4 79.91  7.754 -6.2076
5 87.07  1.389  1.1065
6 75.89  1.684  6.2979

Fit multivariate normal dist and check P(x <= 80) ~ 0.5

# Get the dimension means and correlation matrix
means <- c(x=mean(df$x), y=mean(df$y), z=mean(df$z))
corr <- cor(df)

# Check P(x <= 80)
sum(df$x <= 80)/nrow(df)  # 0.498
pmvnorm(lower=-Inf, upper=c(80, Inf, Inf), mean=means, corr=corr)  # 0.8232

Why is the fitted result 0.82? Where did I go wrong?


回答1:


First, you don't need to simulate anything to study the pmvnorm function:

pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(80,0,0), corr=diag(rep(1,3)))

The result is 0.5, as you expected.

Your means vector is approximately (79, 0, 0), so let's try it:

pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(79,0,0), corr=diag(rep(1,3)))

The result now is 0.8413447. There's nothing the matter. By specifying only the correlation matrix, you told the software to assume that all variances were unity. In your simulation, the variances were 400, 25, and 25: very different from what you specified in the arguments!

The correct calculation uses the covariance matrix of the data, not its correlation matrix:

pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=means, sigma=cov(df))

The result is 0.5178412, quite in keeping with the data.



来源:https://stackoverflow.com/questions/40617821/how-to-calculate-multivariate-normal-distribution-function-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!