thrust | 易学教程

Making the number of key occurances equal using CUDA / Thrust

阅读更多关于 Making the number of key occurances equal using CUDA / Thrust

Is there an efficient way to take a sorted key/value array pair and ensure that each key has an equal number of elements using the CUDA Thrust library? For instance, assume we have the following pair of arrays: ID: 1 2 2 3 3 3 VN: 6 7 8 5 7 8 If we want to have two of each key appear, this would be the result: ID: 2 2 3 3 VN: 7 8 5 7 The actual arrays will be much larger, containing millions of elements or more. I'm able to do this using nested for-loops easily, but I'm interested in knowing whether or not there's a more efficient way to convert the arrays using a GPU. Thrust seems as though

Bad GPU performance when compiling with -G parameter with nvcc compiler

阅读更多关于 Bad GPU performance when compiling with -G parameter with nvcc compiler

I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? Using the -G switch disables most compiler optimizations that nvcc might do in device code. The resulting code will often run slower than code that is not compiled with -G

How to pass an array of vectors to cuda kernel?

阅读更多关于 How to pass an array of vectors to cuda kernel?

I now have thrust::device_vector<int> A[N]; and my kernel function __global__ void kernel(...) { auto a = A[threadIdx.x]; } I know that via thrust::raw_pointer_cast I could pass a device_vector to kernel. But how could I pass an array of vector to it? talonmies The really short answer is that you basically can't, and the longer answer is that you really shouldn't even if you discover or are presented with a hacky way of doing this. And in the spirit of that advice, what you can do is something like this: thrust::device_vector<int> A(N); thrust::device_vector<int> B(N); thrust::device_vector

Thrust copy - OutputIterator column-major order

阅读更多关于 Thrust copy - OutputIterator column-major order

I have a vector of matrices (stored as column major arrays) that I want to concat vertically. Therefore, I want to utilize the copy function from the thrust framework as in the following example snippet: int offset = 0; for(int i = 0; i < matrices.size(); ++i) { thrust::copy( thrust::device_ptr<float>(matrices[i]), thrust::device_ptr<float>(matrices[i]) + rows[i] * cols[i], thrust::device_ptr<float>(result) + offset ); offset += rows[i] * cols[i]; } EDIT: extended example: The problem is, that if I have a matrix A = [[1, 2, 3], [4, 5, 6]] (2 rows, 3 cols; in memory [1, 4, 2, 5, 3, 6]) and

VS program crashes in debug but not release mode?

阅读更多关于 VS program crashes in debug but not release mode?

I am running the following program in VS 2012 to try out the Thrust function find: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <thrust/find.h> #include <thrust/device_vector.h> #include <stdio.h> int main() { thrust::device_vector<char> input(4); input[0] = 'a'; input[1] = 'b'; input[2] = 'c'; input[3] = 'd'; thrust::device_vector<char>::iterator iter; iter = thrust::find(input.begin(), input.end(), 'a'); std::cout << "Index of a = " << iter - input.begin() << std::endl; return 0; } This is a modified version of a code example taken from http://docs.thrust

Function object not working properly

阅读更多关于 Function object not working properly

I have defined the following function object: struct Predicate1 { __device__ bool operator () (const DereferencedIteratorTuple& lhs, const DereferencedIteratorTuple& rhs) { using thrust::get; //if you do <=, returns last occurence of largest element. < returns first if (get<0>(lhs)== get<2>(lhs) && get<0>(lhs)!= 3) return get<1>(lhs) < get<1>(rhs); else return true ; } }; where the DereferencedIteratorTuple is as follows: typedef thrust::tuple<int, float,int> DereferencedIteratorTuple; Moreover, i call it as follows: result = thrust::max_element(iter_begin, iter_end, Predicate1()); But the

Retain Duplicates with Set Intersection in CUDA

阅读更多关于 Retain Duplicates with Set Intersection in CUDA

I'm using CUDA and THRUST to perform paired set operations. I would like to retain duplicates , however. For example: int keys[6] = {1, 1, 1, 3, 4, 5, 5}; int vals[6] = {1, 2, 3, 4, 5, 6, 7}; int comp[2] = {1, 5}; thrust::set_intersection_by_key(keys, keys + 6, comp, comp + 2, vals, rk, rv); Desired result rk[1, 1, 1, 5, 5] rv[1, 2, 3, 6, 7] Actual Result rk[1, 5] rv[5, 7] I want all of the vals where the corresponding key is contained in comp . Is there any way to achieve this using thrust, or do I have to write my own kernel or thrust function? I'm using this function: set_intersection_by

thrust count occurence [duplicate]

阅读更多关于 thrust count occurence [duplicate]

Possible Duplicate: Counting occurences of numbers in cuda array is there a way to use thrust or cuda to count occurrence for the duplicates in an array? for example if I have a device vector { 11, 11, 9, 1, 3, 11, 1, 2, 9, 1, 11} I should get 1 :3 2:1 3:1 9:2, 11:4 if thrust cannot do that, How can I use a kernel to do that? Thanks! I am doing concentration calculation. that's why I am asking this question. assume there are 100000 particles in the domain which has nx X ny X nz cells, i need to calculate the concentration of each cell(how many particles in each cell) My kernel is this __global

Thrust reduce not working with non equal input/output types

阅读更多关于 Thrust reduce not working with non equal input/output types

I'm attempting to reduce the min and max of an array of values using Thrust and I seem to be stuck. Given an array of floats what I would like is to reduce their min and max values in one pass, but using thrust's reduce method I instead get the mother (or at least auntie) of all template compile errors. My original code contains 5 lists of values spread over 2 float4 arrays that I want reduced, but I've boiled it down to this short example. struct ReduceMinMax { __host__ __device__ float2 operator()(float lhs, float rhs) { return make_float2(Min(lhs, rhs), Max(lhs, rhs)); } }; int main(int

Thrust Sort by key on the fly or different approach?

阅读更多关于 Thrust Sort by key on the fly or different approach?

I was wondering if it is possible to sort by keys using Thrust Library without the need of creating a Vector to store the keys (on the fly). For example I have the following two vectors: Keys and Values: vectorKeys: 0, 1, 2, 0, 1, 2, 0, 1, 2 VectorValues: 10, 20, 30, 40, 50, 60, 70, 80, 90 After sort by keys: thrust::sort_by_key(vKeys.begin(), vKeys.end(), vValues.begin()); The Resulting vectors are: vectorKeys: 0, 0, 0, 1, 1, 1, 2, 2, 2 VectorValues: 10, 40, 70, 20, 50, 80, 30, 60, 90 What I would like to know if it is possible to sort_by_key without the need of the vKeys vector (on the fly),