thrust | 易学教程

CUDA Thrust reduction with double2 arrays

阅读更多关于 CUDA Thrust reduction with double2 arrays

问题 I have the following (compilable and executable) code using CUDA Thrust to perform reductions of float2 arrays. It works correctly using namespace std; // includes, system #include <stdlib.h> #include <stdio.h> #include <string.h> #include <math.h> #include <conio.h> #include <typeinfo> #include <iostream> // includes CUDA #include <cuda.h> #include <cuda_runtime.h> // includes Thrust #include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/reduce.h> // float2 +

How to implement nested loops in cuda thrust

阅读更多关于 How to implement nested loops in cuda thrust

I currently have to run a nested loop as follow: for(int i = 0; i < N; i++){ for(int j = i+1; j <= N; j++){ compute(...)//some calculation here } } I've tried leaving the first loop in CPU and do the second loop in GPU . Results are too many memory access . Is there any other ways to do it? For example by thrust::reduce_by_key ? The whole program is here: #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/generate.h> #include <thrust/sort.h> #include <thrust/binary_search.h> #include <thrust/iterator/counting_iterator.h> #include <thrust/random.h> #include

Thrust: Removing duplicates in key-value arrays

阅读更多关于 Thrust: Removing duplicates in key-value arrays

I have a pair of arrays of equal size, I will call them keys and values. For example: K: V 1: 99 1: 100 1: 100 1: 100 1: 103 2: 103 2: 105 3: 45 3: 67 The keys are sorted and the values associated with each key are sorted. How do I remove the value duplicates associated with each key and its corresponding key? That is, I want to compact the above to: 1: 99 1: 100 1: 103 2: 103 <-- This should remain, since key is different 2: 105 3: 45 3: 67 I looked at the stream compaction functions available in Thrust , but was not able to find anything which does this. Is this possible with Thrust? Or do I

is there a better and a faster way to copy from CPU memory to GPU using thrust?

阅读更多关于 is there a better and a faster way to copy from CPU memory to GPU using thrust?

Recently I have been using thrust a lot. I have noticed that in order to use thrust, one must always copy the data from the cpu memory to the gpu memory. Let's see the following example : int foo(int *foo) { host_vector<int> m(foo, foo+ 100000); device_vector<int> s = m; } I'm not quite sure how the host_vector constructor works, but it seems like I'm copying the initial data, coming from *foo , twice - once to the host_vector when it is initialized, and another time when device_vector is initialized. Is there a better way of copying from cpu to gpu without making an intermediate data copies?

thrust set difference fails to compile with calling a host function from a host device function is not allowed

阅读更多关于 thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed

问题 I have a two sets A & B of 20 & 10 integers respectively. B is a subset of A. I need to find the complimentary set of B. I use thrust::set_difference to find the set difference, However it fails to compile with message: warning: calling a __host__ function from a __host__ __device__ function is not allowed My code is as below. I dont know why this simple code fails to compile. #include <thrust/sequence.h> #include <thrust/execution_policy.h> #include <thrust/set_operations.h> #include <thrust

Making the number of key occurances equal using CUDA / Thrust

阅读更多关于 Making the number of key occurances equal using CUDA / Thrust

问题 Is there an efficient way to take a sorted key/value array pair and ensure that each key has an equal number of elements using the CUDA Thrust library? For instance, assume we have the following pair of arrays: ID: 1 2 2 3 3 3 VN: 6 7 8 5 7 8 If we want to have two of each key appear, this would be the result: ID: 2 2 3 3 VN: 7 8 5 7 The actual arrays will be much larger, containing millions of elements or more. I'm able to do this using nested for-loops easily, but I'm interested in knowing

Bad GPU performance when compiling with -G parameter with nvcc compiler

阅读更多关于 Bad GPU performance when compiling with -G parameter with nvcc compiler

问题 I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? 回答1: Using the -G switch disables most compiler optimizations that

VS program crashes in debug but not release mode?

阅读更多关于 VS program crashes in debug but not release mode?

问题 I am running the following program in VS 2012 to try out the Thrust function find: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <thrust/find.h> #include <thrust/device_vector.h> #include <stdio.h> int main() { thrust::device_vector<char> input(4); input[0] = 'a'; input[1] = 'b'; input[2] = 'c'; input[3] = 'd'; thrust::device_vector<char>::iterator iter; iter = thrust::find(input.begin(), input.end(), 'a'); std::cout << "Index of a = " << iter - input.begin() << std

thrust set difference fails to compile with calling a host function from a host device function is not allowed

阅读更多关于 thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed

I have a two sets A & B of 20 & 10 integers respectively. B is a subset of A. I need to find the complimentary set of B. I use thrust::set_difference to find the set difference, However it fails to compile with message: warning: calling a __host__ function from a __host__ __device__ function is not allowed My code is as below. I dont know why this simple code fails to compile. #include <thrust/sequence.h> #include <thrust/execution_policy.h> #include <thrust/set_operations.h> #include <thrust/device_vector.h> thrust::device_vector<int> find_complimentary_set(thrust::device_vector<int> A,

Operating on thrust::complex types with thrust::transform

阅读更多关于 Operating on thrust::complex types with thrust::transform

I'm trying to use thrust::transform to operate on vectors of type thrust:complex<float> without success. The following example blows up during compilation with several pages of errors. #include <cuda.h> #include <cuda_runtime.h> #include <cufft.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/transform.h> #include <thrust/complex.h> int main(int argc, char *argv[]) { thrust::device_vector< thrust::complex<float> > d_vec1(4); thrust::device_vector<float> d_vec2(4); thrust::fill(d_vec1.begin(), d_vec1.end(), thrust::complex<float>(1,1)); thrust::transform(d