thrust | 易学教程

multi item replacing in cuda thrust

阅读更多关于 multi item replacing in cuda thrust

问题 I have a device vector A,B,C as following. A = [1,1,3,3,3,4,4,5,5] B = [1,3,5] C = [2,8,6] So I want to replace each of B in a with corresponding element in C. Eg: 1 is replaced by 2, 3 is replaced by 8, 5 is replaced by 6 so as to get the following result Result = [2,2,8,8,8,4,4,6,6] How do I achieve this in cuda thrust or any way of implementing it in cuda C++. I found thrust::replace which replaces single element at once. Since I need to replace huge amount of data, it becomes bottleneck

Thrust - accessing neighbors

阅读更多关于 Thrust - accessing neighbors

问题 I would like to use Thrust's stream compaction functionality (copy_if) for distilling indices of elements from a vector if the elements adhere to a number of constraints. One of these constraints depends on the values of neighboring elements (8 in 2D and 26 in 3D). My question is: how can I obtain the neighbors of an element in Thrust? The function call operator of the functor for the 'copy_if' basically looks like: __host__ __device__ bool operator()(float x) { bool mark = x < 0.0f; if (mark

How make a stride chunk iterator thrust cuda

阅读更多关于 How make a stride chunk iterator thrust cuda

问题 I need a class iterator like this https://github.com/thrust/thrust/blob/master/examples/strided_range.cu but that this new iterator do the next sequence [k * size_stride, k * size_stride+1, ...,k * size_stride+size_chunk-1,...] with k = 0,1,...,N Example: size_stride = 8 size_chunk = 3 N = 3 then the sequence is [0,1,2,8,9,10,16,17,18,24,25,26] I don't know how do this efficiently... 回答1: The strided range interator is basically a carefully crafted permutation iterator with a functor that

How to compile thrust and c++ project?

阅读更多关于 How to compile thrust and c++ project?

问题 I got big problems about the compile when I want to use thrust and c++ meanwhile. Here is the project structure(just a test project): sortbase.h #include<iostream> #include <thrust/device_vector.h> using namespace std; template<class T> class SortBase { public: void Init() { } void resize(const int &x) { CV.resize(x); cout<<"resize succeed!"<<endl; } private: thrust::device_vector<T> CV; }; sorter.h #ifndef __SORTER_H__ #define __SORTER_H__ #include <iostream> #include <thrust/device_vector.h

Reduce multiple blocks of equal length that are arranged in a big vector Using CUDA

阅读更多关于 Reduce multiple blocks of equal length that are arranged in a big vector Using CUDA

问题 I am looking for a fast way to reduce multiple blocks of equal length that are arranged as a big vector. I have N subarrays(contiguous elements) that are arranged in one big array. each sub array has a fixed size : k. so the size of the whole array is : N*K What I'm doing is to call the kernel N times. in each time it computes the reduction of the subarray as follow: I will iterate over all the subarrays contained in the big vector : for(i=0;i<N;i++){ thrust::device_vector< float > Vec

Writing a simple thrust functor operating on some zipped arrays

阅读更多关于 Writing a simple thrust functor operating on some zipped arrays

问题 I am trying to perform a thrust::reduce_by_key using zip and permutation iterators. i.e. doing this on a zipped array of several 'virtual' permuted arrays. I am having trouble in writing the syntax for the functor density_update . But first the setup of the problem. Here is my function call: thrust::reduce_by_key( dflagt, dflagtend, thrust::make_zip_iterator( thrust::make_tuple( thrust::make_permutation_iterator(dmasst, dmapt), thrust::make_permutation_iterator(dvelt, dmapt), thrust::make

How to estimate GPU memory requirements for thrust based implementation?

阅读更多关于 How to estimate GPU memory requirements for thrust based implementation?

问题 I have 3 different thrust-based implementations that perform certain calculations: first is the slowest and requires the least of GPU memory, second is the fastest and requires the most of GPU memory, and the third one is in-between. For each of those I know the size and data type for each device vector used so I am using vector.size()*sizeof(type) to roughly estimate the memory needed for storage. So for a given input, based on its size, I would like to decide which implementation to use. In

Nested C++ templates

阅读更多关于 Nested C++ templates

问题 I have a function called add_vector_to_scalar which adds a scalar value to a vector ( in ) and stores the result in another vector ( out ). I am learning C++ so I am not sure how to make the type parameter to add_op generic ? I thought about adding another typename T but it did not work. template<typename Vector> void add(Vector& in, Vector& out, T& c) { transform(in.begin(), in.end(), out.begin(), add_op<int>(c)); } The vector could be of two type: device_vector<T> host_vector<T> The add_op

Thrust not calling device function

阅读更多关于 Thrust not calling device function

问题 I have following simple CUDA-Thrust code which adds 10 to device vector but the function is getting called on host side instead of device. #include <algorithm> #include <iostream> #include <numeric> #include <vector> #include <stdio.h> #include <thrust/device_vector.h> __host__ __device__ int add(int x){ #if defined(__CUDA_ARCH__) printf("In device\n"); #else printf("In host\n"); #endif return x+10; } int main(void) { thrust::host_vector<int> H(4); H[0] = H[1] = H[2] = H[3] = 10; thrust:

iterator for vector of structures in thrust

阅读更多关于 iterator for vector of structures in thrust

问题 I'm trying to get access to vector elements in this manner struct point { unsigned int x; unsigned int y; }; ... thrust::device_vector<point> devPoints(hPoints.begin(), hPoints.end()); for(thrust::device_vector<point>::iterator iter = devPoints.begin(); iter != devPoints.end(); iter++) { std::cout << iter->x << " " << iter->y << " " << std::endl; (1) } device_vector was initialized properly. I get following errors: error: expression must have pointer type (at 1) error: no suitable user