thrust | 易学教程

Simple CUDA Thrust Program Error

阅读更多关于 Simple CUDA Thrust Program Error

问题 I just write an simple CUDA Thrust program, but when I run it. I got this error: thrust::system::system_error at position 0x0037f99c . Can someone help me to figure out why this happen? #include<thrust\host_vector.h> #include<thrust\device_vector.h> #include<iostream> using namespace std; using namespace thrust; int main() { thrust::host_vector<int> h_vec(3); h_vec[0]=1;h_vec[1]=2;h_vec[2]=3; thrust::device_vector<int> d_vec(3) ; d_vec= h_vec; int h_sum = thrust::reduce(h_vec.begin(), h_vec

thrust reduction result on device memory

阅读更多关于 thrust reduction result on device memory

问题 Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? In case it is, is it just as easy as assigning the value to a cudaMalloc'ed area, or should I use a thrust::device_ptr? 回答1: Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? The short answer is no. thrust reduce returns a quantity, the result of the reduction. This quantity must be deposited in a host resident variable: Take for example reduce,

Determining the least element and its position in each matrix column with CUDA Thrust

阅读更多关于 Determining the least element and its position in each matrix column with CUDA Thrust

问题 I have a fairly simple problem but I cannot figure out an elegant solution to it. I have a Thrust code which produces c vectors of same size containing values. Let say each of these c vectors have an index. I would like for each vector position to get the index of the c vector for which the value is the lowest: Example: C0 = (0,10,20,3,40) C1 = (1,2 ,3 ,5,10) I would get as result a vector containing the index of the C vector which has the lowest value: result = (0,1 ,1 ,0,1) I have thought

How to asynchronously copy memory from the host to the device using thrust and CUDA streams

阅读更多关于 How to asynchronously copy memory from the host to the device using thrust and CUDA streams

问题 I would like to copy memory from the host to the device using thrust as in thrust::host_vector<float> h_vec(1 << 28); thrust::device_vector<float> d_vec(1 << 28); thrust::copy(h_vec.begin(), h_vec.end(), d_vec.begin()); using CUDA streams analogously to how you would copy memory from the device to the device using streams: cudaStream_t s; cudaStreamCreate(&s); thrust::device_vector<float> d_vec1(1 << 28), d_vec2(1 << 28); thrust::copy(thrust::cuda::par.on(s), d_vec1.begin(), d_vec1.end(), d

thrust::max_element slow in comparison cublasIsamax - More efficient implementation?

阅读更多关于 thrust::max_element slow in comparison cublasIsamax - More efficient implementation?

问题 I need a fast and efficient implementation for finding the index of the maximum value in an array in CUDA. This operation needs to be performed several times. I originally used cublasIsamax for this, however, it sadly returns the index of the maximum absolute value, which is not what I want. Instead, I'm using thrust::max_element, however the speed is rather slow in comparison to cublasIsamax. I use it in the following manner: //d_vector is a pointer on the device pointing to the beginning of

How to normalize matrix columns in CUDA with max performance?

阅读更多关于 How to normalize matrix columns in CUDA with max performance?

问题 How to effectively normalize matrix columns in CUDA? My matrix is stored in column-major, and the typical size is 2000x200. The operation can be represented in the following matlab code. A = rand(2000,200); A = exp(A); A = A./repmat(sum(A,1), [size(A,1) 1]); Can this be done effectively by Thrust, cuBLAS and/or cuNPP? A rapid implementation including 4 kernels is shown as follows. Wondering if these can be done in 1 or 2 kernels to improve the performance, especially for the column summation

Combining two lists by key using Thrust

阅读更多关于 Combining two lists by key using Thrust

问题 Given two key-value lists, I am trying to combine the two sides by matching the keys and applying a function to the two values when the keys match. In my case I want to multiply the values. A small example to make it more clear: Left keys: { 1, 2, 4, 5, 6 } Left values: { 3, 4, 1, 2, 1 } Right keys: { 1, 3, 4, 5, 6, 7 }; Right values: { 2, 1, 1, 4, 1, 2 }; Expected output keys: { 1, 4, 5, 6 } Expected output values: { 6, 1, 8, 1 } I have been able to implement this on the CPU using C++ using

Using Thrust's reduce operator with Pixel uchar4 data error

阅读更多关于 Using Thrust's reduce operator with Pixel uchar4 data error

问题 I have been having trouble converting this example from sort to reduce. I keep getting no suitable conversion function from "uchar4" to "OutputType" exists When I try to compile and run this modified example: thrust::reduce(tptr, tptr+(DIM*DIM), int(0), reduce_functor()); Is the crux of my issue with the modified functor ... where I was trying to avoid adding chars but returning the summed int value of the pixels so I can get the average color later on of the image ... #include <stdio.h>

Multiple occurrence subvector search with cuda Thrust

阅读更多关于 Multiple occurrence subvector search with cuda Thrust

问题 I want to find occurrences of subvector in a device vector in GPU, with thrust library. Say for an array of str = "aaaabaaab", I need to find occurrences of substr = "ab". How shall I use thrust::find function to search a subvector? In nutshell How shall I implement string search algorithm with thrust library? 回答1: I would agree with the comments provided that thrust doesn't provide a single function that does this in "typical thrust fashion" and you would not want to use a sequence of thrust

thrust operations empty host array

阅读更多关于 thrust operations empty host array

问题 I want to do some thrust operations but I am not sure how exactly. Right now , I am receiving am array full of zeros ( the h_a array) I have : #include <cstdio> #include <cstdlib> #include <cmath> #include <iostream> #include <cuda.h> #include <cuda_runtime_api.h> #include <thrust/device_ptr.h> #include <thrust/fill.h> #include <thrust/transform.h> #include <thrust/functional.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/copy.h> #include <thrust