What is difference between replicate n times and generate n directly in sampling of R?

本小妞迷上赌 提交于 2020-01-06 02:55:07

问题


I am asked to "simulate x as an independent identically distributed (iid) normal variable with mean=0, std=1.5 with sample length 500"

I am doing the sampling in following two ways:

set.seed(8402)
X <- rnorm(500, 0, 1.5)
head(X)

and I got

-1.8297969 -0.1862884 1.4219400 -1.0841421 -1.5276701 1.6159368

However, if I do

X <- replicate(500, rnorm(1,0,1.5))
head(X)

and I got

-0.04032755 0.92002552 -2.28001943 -1.36840869 1.49820718 0.06205003

My question is what is the right way to generate iid normal variable? What is the difference between those two ways?

Many thanks!


回答1:


R Internal

Internally in R, the C function from <Rmath.h>: double rnorm (double mean, double sd) function generates one random number at a time. When you call its R wrapper function rnorm(n, mean, sd), it calls the C level function n times.

This is as same as you call R level function only once with n = 1, but replicate it n times using replicate.

The first method is much faster (possibly the difference will be seen when n is really large), as everything is done at C level. replicate however, is a wrapper of sapply, so it is not really a vectorized function (read on Is the "*apply" family really not vectorized?).

In addition, if you set the same random seed for both, you are going to get the same set of random numbers.


A more illustrative experiment

In my comment below, I say that random seed is only set once on entry. To help people understand this, I provide this example. There is no need to use large n. n = 4 is sufficient.

First, let's set seed at 0, while generating 4 standard normal samples:

set.seed(0); rnorm(4, 0, 1)
## we get
[1]  1.2629543 -0.3262334  1.3297993  1.2724293

Note that in this case, all 4 numbers are obtained from the entry seed 0.

Now, let's do this:

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## do not reset seed, but continue with the previous seed
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.329799 1.272429

See?

But if we reset seed in the middle, for example, set it back to 0

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## reset seed
set.seed(0)
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.2629543 -0.3262334

This is what I mean by "entry".



来源:https://stackoverflow.com/questions/36428415/what-is-difference-between-replicate-n-times-and-generate-n-directly-in-sampling

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!