thrust

CUDA Thrust reduction with double2 arrays

烈酒焚心 提交于 2019-12-04 03:50:16
问题 I have the following (compilable and executable) code using CUDA Thrust to perform reductions of float2 arrays. It works correctly using namespace std; // includes, system #include <stdlib.h> #include <stdio.h> #include <string.h> #include <math.h> #include <conio.h> #include <typeinfo> #include <iostream> // includes CUDA #include <cuda.h> #include <cuda_runtime.h> // includes Thrust #include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/reduce.h> // float2 +

How to implement nested loops in cuda thrust

吃可爱长大的小学妹 提交于 2019-12-03 21:51:07
I currently have to run a nested loop as follow: for(int i = 0; i < N; i++){ for(int j = i+1; j <= N; j++){ compute(...)//some calculation here } } I've tried leaving the first loop in CPU and do the second loop in GPU . Results are too many memory access . Is there any other ways to do it? For example by thrust::reduce_by_key ? The whole program is here: #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/generate.h> #include <thrust/sort.h> #include <thrust/binary_search.h> #include <thrust/iterator/counting_iterator.h> #include <thrust/random.h> #include

Thrust: Removing duplicates in key-value arrays

情到浓时终转凉″ 提交于 2019-12-03 08:56:15
I have a pair of arrays of equal size, I will call them keys and values. For example: K: V 1: 99 1: 100 1: 100 1: 100 1: 103 2: 103 2: 105 3: 45 3: 67 The keys are sorted and the values associated with each key are sorted. How do I remove the value duplicates associated with each key and its corresponding key? That is, I want to compact the above to: 1: 99 1: 100 1: 103 2: 103 <-- This should remain, since key is different 2: 105 3: 45 3: 67 I looked at the stream compaction functions available in Thrust , but was not able to find anything which does this. Is this possible with Thrust? Or do I

is there a better and a faster way to copy from CPU memory to GPU using thrust?

喜你入骨 提交于 2019-12-03 00:20:09
Recently I have been using thrust a lot. I have noticed that in order to use thrust, one must always copy the data from the cpu memory to the gpu memory. Let's see the following example : int foo(int *foo) { host_vector<int> m(foo, foo+ 100000); device_vector<int> s = m; } I'm not quite sure how the host_vector constructor works, but it seems like I'm copying the initial data, coming from *foo , twice - once to the host_vector when it is initialized, and another time when device_vector is initialized. Is there a better way of copying from cpu to gpu without making an intermediate data copies?

thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed

允我心安 提交于 2019-12-02 22:54:35
问题 I have a two sets A & B of 20 & 10 integers respectively. B is a subset of A. I need to find the complimentary set of B. I use thrust::set_difference to find the set difference, However it fails to compile with message: warning: calling a __host__ function from a __host__ __device__ function is not allowed My code is as below. I dont know why this simple code fails to compile. #include <thrust/sequence.h> #include <thrust/execution_policy.h> #include <thrust/set_operations.h> #include <thrust

Making the number of key occurances equal using CUDA / Thrust

冷暖自知 提交于 2019-12-02 18:31:50
问题 Is there an efficient way to take a sorted key/value array pair and ensure that each key has an equal number of elements using the CUDA Thrust library? For instance, assume we have the following pair of arrays: ID: 1 2 2 3 3 3 VN: 6 7 8 5 7 8 If we want to have two of each key appear, this would be the result: ID: 2 2 3 3 VN: 7 8 5 7 The actual arrays will be much larger, containing millions of elements or more. I'm able to do this using nested for-loops easily, but I'm interested in knowing

Bad GPU performance when compiling with -G parameter with nvcc compiler

你。 提交于 2019-12-02 16:23:06
问题 I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? 回答1: Using the -G switch disables most compiler optimizations that

VS program crashes in debug but not release mode?

僤鯓⒐⒋嵵緔 提交于 2019-12-02 16:00:40
问题 I am running the following program in VS 2012 to try out the Thrust function find: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <thrust/find.h> #include <thrust/device_vector.h> #include <stdio.h> int main() { thrust::device_vector<char> input(4); input[0] = 'a'; input[1] = 'b'; input[2] = 'c'; input[3] = 'd'; thrust::device_vector<char>::iterator iter; iter = thrust::find(input.begin(), input.end(), 'a'); std::cout << "Index of a = " << iter - input.begin() << std

thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed

大城市里の小女人 提交于 2019-12-02 13:56:16
I have a two sets A & B of 20 & 10 integers respectively. B is a subset of A. I need to find the complimentary set of B. I use thrust::set_difference to find the set difference, However it fails to compile with message: warning: calling a __host__ function from a __host__ __device__ function is not allowed My code is as below. I dont know why this simple code fails to compile. #include <thrust/sequence.h> #include <thrust/execution_policy.h> #include <thrust/set_operations.h> #include <thrust/device_vector.h> thrust::device_vector<int> find_complimentary_set(thrust::device_vector<int> A,

Operating on thrust::complex types with thrust::transform

假装没事ソ 提交于 2019-12-02 13:38:35
I'm trying to use thrust::transform to operate on vectors of type thrust:complex<float> without success. The following example blows up during compilation with several pages of errors. #include <cuda.h> #include <cuda_runtime.h> #include <cufft.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/transform.h> #include <thrust/complex.h> int main(int argc, char *argv[]) { thrust::device_vector< thrust::complex<float> > d_vec1(4); thrust::device_vector<float> d_vec2(4); thrust::fill(d_vec1.begin(), d_vec1.end(), thrust::complex<float>(1,1)); thrust::transform(d