
thrust functor: “too many resources requested for launch”

我怕爱的太早我们不能终老 提交于 2021-02-11 11:55:07
问题 I'm trying to implement something like this in CUDA: for each element p = { p if p >= floor z if p < floor Where floor and z are constants configured at the start of the test. I have attempted to implement it like so, but I get the error "too many resources requested for launch" A functor: struct floor_functor : thrust::unary_function <float, float> { const float floorLevel, floorVal; floor_functor(float _floorLevel, float _floorVal) : floorLevel(_floorLevel), floorVal(_floorVal){} __host__ _

耗尽温柔 提交于 2021-02-10 06:18:27
问题 I want to use thrust::reduce on a thrust::host_vector of thrust::tuple<double,double> . Because there is no predefined thrust::plus<thrust::tuple<double,double>> I wrote my own and used the variant of thrust::reduce with four arguments. Since I'm a good citizen I put my custom version of plus in my own namespace where I left the primary template simply undefined and specialized it for thrust::tuple<T...> . #include <iostream> #include <tuple> #include <thrust/host_vector.h> #include <thrust

CUDA: Why Thrust is so slow on uploading data to GPU?

梦想与她 提交于 2021-02-08 09:33:32
问题 I'm new to GPU world and just installed CUDA for writing some program. I played with thrust library but find out that it is so slow when uploading data to GPU. Just about 35MB/s in host-to-device part on my not-bad desktop. How come it is? Environment: Visual Studio 2012, CUDA 5.0, GTX760, Intel-i7, Windows 7 x64 GPU Bandwidth test: It is supposed to have at least 11GB/s of transfer speed for host to device or vice versa! But it didn't! Here's the test program: #include <iostream> #include

thrust copy_if: incomplete type is not allowed

南笙酒味 提交于 2021-01-29 03:10:50
问题 I'm trying to use thrust::copy_if to compact an array with a predicate checking for positive numbers: header file: file.h: struct is_positive { __host__ __device__ bool operator()(const int x) { return (x >= 0); } }; and #include "../headers/file.h" #include <thrust/device_ptr.h> #include <thrust/device_vector.h> #include <thrust/copy.h> void compact(int* d_inputArray, int* d_outputArray, const int size) { thrust::device_ptr<int> t_inputArray(d_inputArray); thrust::device_ptr<int> t


谁说胖子不能爱 提交于 2021-01-23 13:04:29
你离开我真会死。 提交于 2021-01-20 09:30:19
问题 I am trying to run some experiments on an algorithm coded in Thrust. I'd like to know the impact of the number of threads per block in the performance of my algorithm. Is it possible to restrict thrust so that it does not use more than X number of threads per block? 回答1: Thrust doesn't expose any ability to either directly set the number of threads per block or the number of blocks used in a particular kernel call. These things are indirectly determined by algorithm and problem size, but you

浪尽此生 提交于 2021-01-20 09:29:48
问题 I am trying to run some experiments on an algorithm coded in Thrust. I'd like to know the impact of the number of threads per block in the performance of my algorithm. Is it possible to restrict thrust so that it does not use more than X number of threads per block? 回答1: Thrust doesn't expose any ability to either directly set the number of threads per block or the number of blocks used in a particular kernel call. These things are indirectly determined by algorithm and problem size, but you

可紊 提交于 2021-01-05 12:00:07
问题 I have a custom class myClass which has members weight and config . I'd like to run an inclusive scan on a bunch of myClass es, but only on the weight s. Basically what I want is to take: [ {configA, weightA}, {configB, weightB}, {configC, weightC}, ...] to: [ {configA, weightA}, {configB, weight A + weightB}, {configC, weight A + weight B + weightC}, ...] Is there a simple way to do this using Thrust's fancy iterators? Since the binaryOp is required to be associative, I don't see how to do

帅比萌擦擦* 提交于 2020-12-25 04:34:51
问题 I'm considering the following simple code in which I'm converting thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin(); and thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin(); to raw pointers. To this end, I'm passing &(h_temp_iterator[0]) and &(d_temp_iterator[0]) to a function and a kernel, respectively. The former (CPU case) compiles, the latter (GPU case) not. The two cases should be in principle symmetric, so I do not understand the reason for the error

不想你离开。 提交于 2020-12-25 04:34:26
问题 I'm considering the following simple code in which I'm converting thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin(); and thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin(); to raw pointers. To this end, I'm passing &(h_temp_iterator[0]) and &(d_temp_iterator[0]) to a function and a kernel, respectively. The former (CPU case) compiles, the latter (GPU case) not. The two cases should be in principle symmetric, so I do not understand the reason for the error