thrust

Simple CUDA Thrust Program Error

别说谁变了你拦得住时间么 提交于 2019-12-18 09:45:42
问题 I just write an simple CUDA Thrust program, but when I run it. I got this error: thrust::system::system_error at position 0x0037f99c . Can someone help me to figure out why this happen? #include<thrust\host_vector.h> #include<thrust\device_vector.h> #include<iostream> using namespace std; using namespace thrust; int main() { thrust::host_vector<int> h_vec(3); h_vec[0]=1;h_vec[1]=2;h_vec[2]=3; thrust::device_vector<int> d_vec(3) ; d_vec= h_vec; int h_sum = thrust::reduce(h_vec.begin(), h_vec

thrust reduction result on device memory

余生长醉 提交于 2019-12-18 06:56:16
问题 Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? In case it is, is it just as easy as assigning the value to a cudaMalloc'ed area, or should I use a thrust::device_ptr? 回答1: Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? The short answer is no. thrust reduce returns a quantity, the result of the reduction. This quantity must be deposited in a host resident variable: Take for example reduce,

Determining the least element and its position in each matrix column with CUDA Thrust

≯℡__Kan透↙ 提交于 2019-12-18 04:16:21
问题 I have a fairly simple problem but I cannot figure out an elegant solution to it. I have a Thrust code which produces c vectors of same size containing values. Let say each of these c vectors have an index. I would like for each vector position to get the index of the c vector for which the value is the lowest: Example: C0 = (0,10,20,3,40) C1 = (1,2 ,3 ,5,10) I would get as result a vector containing the index of the C vector which has the lowest value: result = (0,1 ,1 ,0,1) I have thought

How to asynchronously copy memory from the host to the device using thrust and CUDA streams

岁酱吖の 提交于 2019-12-17 22:25:24
问题 I would like to copy memory from the host to the device using thrust as in thrust::host_vector<float> h_vec(1 << 28); thrust::device_vector<float> d_vec(1 << 28); thrust::copy(h_vec.begin(), h_vec.end(), d_vec.begin()); using CUDA streams analogously to how you would copy memory from the device to the device using streams: cudaStream_t s; cudaStreamCreate(&s); thrust::device_vector<float> d_vec1(1 << 28), d_vec2(1 << 28); thrust::copy(thrust::cuda::par.on(s), d_vec1.begin(), d_vec1.end(), d

thrust::max_element slow in comparison cublasIsamax - More efficient implementation?

眉间皱痕 提交于 2019-12-17 20:32:39
问题 I need a fast and efficient implementation for finding the index of the maximum value in an array in CUDA. This operation needs to be performed several times. I originally used cublasIsamax for this, however, it sadly returns the index of the maximum absolute value, which is not what I want. Instead, I'm using thrust::max_element, however the speed is rather slow in comparison to cublasIsamax. I use it in the following manner: //d_vector is a pointer on the device pointing to the beginning of

How to normalize matrix columns in CUDA with max performance?

守給你的承諾、 提交于 2019-12-17 18:36:15
问题 How to effectively normalize matrix columns in CUDA? My matrix is stored in column-major, and the typical size is 2000x200. The operation can be represented in the following matlab code. A = rand(2000,200); A = exp(A); A = A./repmat(sum(A,1), [size(A,1) 1]); Can this be done effectively by Thrust, cuBLAS and/or cuNPP? A rapid implementation including 4 kernels is shown as follows. Wondering if these can be done in 1 or 2 kernels to improve the performance, especially for the column summation

Combining two lists by key using Thrust

安稳与你 提交于 2019-12-17 14:07:02
问题 Given two key-value lists, I am trying to combine the two sides by matching the keys and applying a function to the two values when the keys match. In my case I want to multiply the values. A small example to make it more clear: Left keys: { 1, 2, 4, 5, 6 } Left values: { 3, 4, 1, 2, 1 } Right keys: { 1, 3, 4, 5, 6, 7 }; Right values: { 2, 1, 1, 4, 1, 2 }; Expected output keys: { 1, 4, 5, 6 } Expected output values: { 6, 1, 8, 1 } I have been able to implement this on the CPU using C++ using

Using Thrust's reduce operator with Pixel uchar4 data error

落爺英雄遲暮 提交于 2019-12-14 03:28:06
问题 I have been having trouble converting this example from sort to reduce. I keep getting no suitable conversion function from "uchar4" to "OutputType" exists When I try to compile and run this modified example: thrust::reduce(tptr, tptr+(DIM*DIM), int(0), reduce_functor()); Is the crux of my issue with the modified functor ... where I was trying to avoid adding chars but returning the summed int value of the pixels so I can get the average color later on of the image ... #include <stdio.h>

Multiple occurrence subvector search with cuda Thrust

浪子不回头ぞ 提交于 2019-12-14 03:25:18
问题 I want to find occurrences of subvector in a device vector in GPU, with thrust library. Say for an array of str = "aaaabaaab", I need to find occurrences of substr = "ab". How shall I use thrust::find function to search a subvector? In nutshell How shall I implement string search algorithm with thrust library? 回答1: I would agree with the comments provided that thrust doesn't provide a single function that does this in "typical thrust fashion" and you would not want to use a sequence of thrust

thrust operations empty host array

半世苍凉 提交于 2019-12-14 03:24:32
问题 I want to do some thrust operations but I am not sure how exactly. Right now , I am receiving am array full of zeros ( the h_a array) I have : #include <cstdio> #include <cstdlib> #include <cmath> #include <iostream> #include <cuda.h> #include <cuda_runtime_api.h> #include <thrust/device_ptr.h> #include <thrust/fill.h> #include <thrust/transform.h> #include <thrust/functional.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/copy.h> #include <thrust