thrust

large integer addition with CUDA

丶灬走出姿态 提交于 2019-11-27 04:38:09
问题 I've been developing a cryptographic algorithm on the GPU and currently stuck with an algorithm to perform large integer addition. Large integers are represented in a usual way as a bunch of 32-bit words. For example, we can use one thread to add two 32-bit words. For simplicity, let assume that the numbers to be added are of the same length and number of threads per block == number of words. Then: __global__ void add_kernel(int *C, const int *A, const int *B) { int x = A[threadIdx.x]; int y

Generating random numbers with uniform distribution using Thrust

非 Y 不嫁゛ 提交于 2019-11-27 03:32:43
问题 I need to generate a vector with random numbers between 0.0 and 1.0 using Thrust . The only documented example I could find produces very large random numbers ( thrust::generate(myvector.begin(), myvector.end(), rand ). I'm sure the answer is simple, but I would appreciate any suggestions. 回答1: Thrust has random generators you can use to produce sequences of random numbers. To use them with a device vector you will need to create a functor which returns a different element of the random

how to cast thrust::device_vector<int> to raw pointer

℡╲_俬逩灬. 提交于 2019-11-27 02:41:17
问题 I have a thrust device_vector . I want to cast it to a raw pointer so that I can pass it to a kernel. How can I do so? thrust::device_vector<int> dv(10); //CAST TO RAW kernel<<<bl,tpb>>>(pass raw) 回答1: You can do this using thrust::raw_pointer_cast . The device vector class has a member function data which will return a thrust::device_ptr to the memory held by the vector, which can be cast, something like this: thrust::device_vector<int> dv(10); int * dv_ptr = thrust::raw_pointer_cast(dv.data

Cuda Random Number Generation

穿精又带淫゛_ 提交于 2019-11-26 23:35:24
问题 I was wondering what was the best way to generate one pseudo random number between 0 and 49k that would be the same for each thread, by using curand or something else. I prefer to generate the random numbers inside the kernel because I will have to generate one at the time but about 10k times. And I could use floats between 0.0 and 1.0, but I've no idea how to make my PRN available for all threads, because most post and example show how to have different PRN for each threads. Thanks 回答1:

How to use Thrust to sort the rows of a matrix?

安稳与你 提交于 2019-11-26 17:47:22
I have a 5000x500 matrix and I want to sort each row separately with cuda. I can use arrayfire but this is just a for loop over the thrust::sort, which should not be efficient. https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cuda/kernel/sort.hpp for(dim_type w = 0; w < val.dims[3]; w++) { dim_type valW = w * val.strides[3]; for(dim_type z = 0; z < val.dims[2]; z++) { dim_type valWZ = valW + z * val.strides[2]; for(dim_type y = 0; y < val.dims[1]; y++) { dim_type valOffset = valWZ + y * val.strides[1]; if(isAscending) { thrust::sort(val_ptr + valOffset, val_ptr + valOffset + val

Thrust inside user written kernels

江枫思渺然 提交于 2019-11-26 12:18:37
I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code. I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code? As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo > fooVector; // Do something thrust-y with fooVector Foo* fooArray = thrust::raw_pointer_cast( &fooVector[0] ); //

Thrust inside user written kernels

时光总嘲笑我的痴心妄想 提交于 2019-11-26 02:55:44
问题 I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code. I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code? 回答1: As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo >