thrust | 易学教程

how to calculate an average from a int2 array using Thrust

阅读更多关于 how to calculate an average from a int2 array using Thrust

问题 I'm trying to calculate the average of a certain array which contains points (x,y). is it possible to use thrust to find the average point represented as a (x,y) point? i could also represent the array as a thrust::device_vector<int> when each cell contains the absolute position of the point, meaning i*numColumns + j though I'm not sure that the average number represents the average cell. Thanks! 回答1: #include <iostream> #include <thrust/device_vector.h> #include <thrust/reduce.h> struct add

how fast is thrust::sort and what is the fastest radix sort implementation

阅读更多关于 how fast is thrust::sort and what is the fastest radix sort implementation

问题 I'm a newbie to GPU programming. Recently, I'm trying to implement the gpu bvh construction algorithm based on an tutorial: http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/. In the first step of this algorithm, the morton code(unsigned int) of every primitive is computed and sorted. The tutorial gives a reference time cost of the morton code computing and sorting for 12K objects: 0.02 ms, one thread per object: Calculate bounding box and assign

CUB (CUDA UnBound) equivalent of thrust::gather

阅读更多关于 CUB (CUDA UnBound) equivalent of thrust::gather

问题 Due to some performance issues with the Thrust libraries (see this page for more details), I am planning on re-factoring a CUDA application to use CUB instead of Thrust. Specifically, to replace the thrust::sort_by_key and thrust::inclusive_scan calls). In a particular point in my application I need to sort 3 arrays by key. This is how I did this with thrust: thrust::sort_by_key(key_iter, key_iter + numKeys, indices); thrust::gather_wrapper(indices, indices + numKeys, thrust::make_zip

Operating on thrust::complex types with thrust::transform

阅读更多关于 Operating on thrust::complex types with thrust::transform

问题 I'm trying to use thrust::transform to operate on vectors of type thrust:complex<float> without success. The following example blows up during compilation with several pages of errors. #include <cuda.h> #include <cuda_runtime.h> #include <cufft.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/transform.h> #include <thrust/complex.h> int main(int argc, char *argv[]) { thrust::device_vector< thrust::complex<float> > d_vec1(4); thrust::device_vector<float> d

How to pass an array of vectors to cuda kernel?

阅读更多关于 How to pass an array of vectors to cuda kernel?

问题 I now have thrust::device_vector<int> A[N]; and my kernel function __global__ void kernel(...) { auto a = A[threadIdx.x]; } I know that via thrust::raw_pointer_cast I could pass a device_vector to kernel. But how could I pass an array of vector to it? 回答1: The really short answer is that you basically can't, and the longer answer is that you really shouldn't even if you discover or are presented with a hacky way of doing this. And in the spirit of that advice, what you can do is something

Retain Duplicates with Set Intersection in CUDA

阅读更多关于 Retain Duplicates with Set Intersection in CUDA

问题 I'm using CUDA and THRUST to perform paired set operations. I would like to retain duplicates , however. For example: int keys[6] = {1, 1, 1, 3, 4, 5, 5}; int vals[6] = {1, 2, 3, 4, 5, 6, 7}; int comp[2] = {1, 5}; thrust::set_intersection_by_key(keys, keys + 6, comp, comp + 2, vals, rk, rv); Desired result rk[1, 1, 1, 5, 5] rv[1, 2, 3, 6, 7] Actual Result rk[1, 5] rv[5, 7] I want all of the vals where the corresponding key is contained in comp . Is there any way to achieve this using thrust,

Thrust reduce not working with non equal input/output types

阅读更多关于 Thrust reduce not working with non equal input/output types

问题 I'm attempting to reduce the min and max of an array of values using Thrust and I seem to be stuck. Given an array of floats what I would like is to reduce their min and max values in one pass, but using thrust's reduce method I instead get the mother (or at least auntie) of all template compile errors. My original code contains 5 lists of values spread over 2 float4 arrays that I want reduced, but I've boiled it down to this short example. struct ReduceMinMax { __host__ __device__ float2

thrust count occurence [duplicate]

阅读更多关于 thrust count occurence [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Counting occurences of numbers in cuda array is there a way to use thrust or cuda to count occurrence for the duplicates in an array? for example if I have a device vector { 11, 11, 9, 1, 3, 11, 1, 2, 9, 1, 11} I should get 1 :3 2:1 3:1 9:2, 11:4 if thrust cannot do that, How can I use a kernel to do that? Thanks! I am doing concentration calculation. that's why I am asking this question. assume there are 100000

Thrust transform throws error: “bulk_kernel_by_value: an illegal memory access was encountered”

阅读更多关于 Thrust transform throws error: “bulk_kernel_by_value: an illegal memory access was encountered”

问题 I'm rather new to CUDA/Thrust and have a problem with a code snippet. To make it easier I have trimmed it down to the bare minimum. The code is the following: struct functor{ functor(float (*g)(const float&)) : _g{g} {} __host__ __device__ float operator()(const float& x) const { return _g(x); } private: float (*_g)(const float&); }; __host__ __device__ float g(const float& x){return 3*x;} int main(void){ thrust::device_vector<float> X(4,1); thrust::transform(X.begin(), X.end(), X.begin(),

How to pass a vector to the constructor of a thrust-based odeint observer, such that it can be read within the functor

阅读更多关于 How to pass a vector to the constructor of a thrust-based odeint observer, such that it can be read within the functor

问题 I am extending the parameter study example from boost's odeint used with thrust, and I do not know how to pass a vector of values to the constructor of the observer, such that those values can be accessed (read-only) from within the observer's functor. The following is the code just for the observer. //// Observes the system, comparing the current state to //// values in unchangingVector struct minimum_perturbation_observer { struct minPerturbFunctor { template< class T > __host__ __device__