Difference between runif and sample in R?

梦想的初衷 提交于 2020-01-12 08:21:09

问题


In terms of probability distribution they use? I know that runif gives fractional numbers and sample gives whole numbers, but what I am interested in is if sample also use the 'uniform probability distribution'?


回答1:


Consider the following code and output:

> set.seed(1)
> round(runif(10,1,100))
 [1] 27 38 58 91 21 90 95 66 63  7
> set.seed(1)
> sample(1:100, 10, replace=TRUE)
 [1] 27 38 58 91 21 90 95 67 63  7

This strongly suggests that when asked to do the same thing, the 2 functions give pretty much the same output (though interestingly it is round that gives the same output rather than floor or ceiling). The main differences are in the defaults and if you don't change those defaults then both would give something called a uniform (though sample would be considered a discrete uniform and by default without replacement).

Edit

The more correct comparison is:

> ceiling(runif(10,0,100))
 [1] 27 38 58 91 21 90 95 67 63  7

instead of using round.

We can even step that up a notch:

> set.seed(1)
> tmp1 <- sample(1:100, 1000, replace=TRUE)
> set.seed(1)
> tmp2 <- ceiling(runif(1000,0,100))
> all.equal(tmp1,tmp2)
[1] TRUE

Of course if the probs argument to sample is used (with not all values equal), then it will no longer be uniform.




回答2:


sample samples from a fixed set of inputs, and if a length-1 input is passed as the first argument, returns an integer output(s).

On the other hand, runif returns a sample from a real-valued range.

 > sample(c(1,2,3), 1)
 [1] 2
 > runif(1, 1, 3)
 [1] 1.448551



回答3:


sample() runs faster than ceiling(runif()) This is useful to know if doing many simulations or bootstrapping.

Crude time trial script that time tests 4 equivalent scripts:

n<- 100                     # sample size
m<- 10000                   # simulations
system.time(sample(n, size=n*m, replace =T))  # faster than ceiling/runif 
system.time(ceiling(runif(n*m, 0, n)))
system.time(ceiling(n * runif(n*m)))
system.time(floor(runif(n*m, 1, n+1)))

The proportional time advantage increases with n and m but watch you don't fill memory!

BTW Don't use round() to convert uniformly distributed continuous to uniformly distributed integer since terminal values get selected only half the time they should.



来源:https://stackoverflow.com/questions/26978281/difference-between-runif-and-sample-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!