runif function in Cuda | 易学教程

问题

I am trying to implement a Metropolis-Hastings algorithm in Cuda. For this algorithm, I need to be able to generate many uniform random numbers with varying range. Therefore, I would like to have a function called runif(min, max) that returns a uniformly distributed number in this range. This function has to be called multiple times inside another function that actually implements the algorithm.

Based on this post, I tried to put the code shown there into a function (see below). If I understood this correctly, the same state leads to the same sequences of numbers. So, if the state doesn't change, I always get the same output. One alternative would be to generate a new state inside the runif function so that each time the function is called, it is called with another state. As I've heard though, this is not advisable since the function gets slow.

So, what would be the best implementation of such a function? Should I generate a new state inside the function or generate a new one outside each time I call the function? Or is there yet another approach?

__device__ float runif(float a, float b, curandState state)
{
  float myrandf = curand_uniform_double(&state);
  myrandf *=  (b - a + 0.999999);
  myrandf += a;
  return myrandf;
}

回答1:

How it works

curand_uniform* family of functions accepts a pointer to curandState object, uses it somehow and modifies it, so when next time curand_uniform*function will be called with the same state object you could have desired randomness.

Important thing here is: In order to get meaningful results you need to write curandState changes back.

Wrong way 1

For now you are passing curandState by value, so state changes are lost after kernel returns. Not even mentioning unnecessary waste of time on copying.

Wrong way 2

Creating and initializing of a new local state inside kernel will not only kill performance (and defeat any use of CUDA) but will give you wrong distribution.

Right way

In the sample code you've linked, curandState is passed by pointer, that guarantees that modifications are saved (somewhere where this pointer points to).

Usually, you would want to allocate and initialize an array of random states once in your program (before launching any kernels that require RNG). Then, in order to generate some numbers, you access this array from kernels, with indices based on thread ids. Multiple (many) states are required to avoid data race (at least one state per concurrently running curand_uniform* function).

This way you avoid performance costs of copies and state initialization and get your perfect distribution.

See cuRand documentation for mode information and sample code.

来源：https://stackoverflow.com/questions/33846594/runif-function-in-cuda

标签

random

cuda

distribution