thrust

thrust functor: “too many resources requested for launch”

我怕爱的太早我们不能终老 提交于 2021-02-11 11:55:07
问题 I'm trying to implement something like this in CUDA: for each element p = { p if p >= floor z if p < floor Where floor and z are constants configured at the start of the test. I have attempted to implement it like so, but I get the error "too many resources requested for launch" A functor: struct floor_functor : thrust::unary_function <float, float> { const float floorLevel, floorVal; floor_functor(float _floorLevel, float _floorVal) : floorLevel(_floorLevel), floorVal(_floorVal){} __host__ _

Thrust reduce with tuple accumulator

耗尽温柔 提交于 2021-02-10 06:18:27
问题 I want to use thrust::reduce on a thrust::host_vector of thrust::tuple<double,double> . Because there is no predefined thrust::plus<thrust::tuple<double,double>> I wrote my own and used the variant of thrust::reduce with four arguments. Since I'm a good citizen I put my custom version of plus in my own namespace where I left the primary template simply undefined and specialized it for thrust::tuple<T...> . #include <iostream> #include <tuple> #include <thrust/host_vector.h> #include <thrust

CUDA: Why Thrust is so slow on uploading data to GPU?

梦想与她 提交于 2021-02-08 09:33:32
问题 I'm new to GPU world and just installed CUDA for writing some program. I played with thrust library but find out that it is so slow when uploading data to GPU. Just about 35MB/s in host-to-device part on my not-bad desktop. How come it is? Environment: Visual Studio 2012, CUDA 5.0, GTX760, Intel-i7, Windows 7 x64 GPU Bandwidth test: It is supposed to have at least 11GB/s of transfer speed for host to device or vice versa! But it didn't! Here's the test program: #include <iostream> #include

thrust copy_if: incomplete type is not allowed

南笙酒味 提交于 2021-01-29 03:10:50
问题 I'm trying to use thrust::copy_if to compact an array with a predicate checking for positive numbers: header file: file.h: struct is_positive { __host__ __device__ bool operator()(const int x) { return (x >= 0); } }; and file.cu #include "../headers/file.h" #include <thrust/device_ptr.h> #include <thrust/device_vector.h> #include <thrust/copy.h> void compact(int* d_inputArray, int* d_outputArray, const int size) { thrust::device_ptr<int> t_inputArray(d_inputArray); thrust::device_ptr<int> t

耶鲁大学教授给研究生做科研的11条“军规”!

谁说胖子不能爱 提交于 2021-01-23 13:04:29
本文译自耶鲁大学Stephen C. Stearns教授的文章“Some Modest Advice for Graduate Students”。他是生态学与进化生物学讲座教授,他开设的公开课《进化、生态和行为原理》非常精彩(部分课程已翻译成中文)。华盛顿大学生物学讲座教授Raymond B. Huey说,唯有这篇文章可以与他自己的文章“如何做一个优秀的科学家”(On becoming a better scientist”媲美,相提并论。 >>>> 一 永远要做好最坏的打算 凡事预则立,不预则废。你只要做一点点的“预”,就可以让你在博士生涯中避免一些灭顶之灾。想吐槽就吐槽吧(Be cynical)。假如你的研究计划行不通,假如某个导师非但对你的研究计划不予支持,甚至嗤之以鼻。那么,你还是赶紧换一个研究题目为妙。 二 别指望教授来管你 现实中,有些教授会去管你,有些则不会去管你。大部分教授估计想管你,但他们整日都忙得晕头转向,不亦乐乎,自己都顾不过来,那有时间去管你呢,爱莫能助。那么,你就得完全靠自己,而且最好习以为常。我这么说有多层含义,其中两个要点是: 1.你最好尽早确定你到底想做什么题目。学位是你要去拿,而不是教授要去拿,你要你自己去争取。当然,导师也不会袖手旁观,导师会给你一些指导,也会在一定程度上帮你解决你在培养程序和经费上的后顾之忧,但是,且记

Launch Configuration in Thrust

你离开我真会死。 提交于 2021-01-20 09:30:19
问题 I am trying to run some experiments on an algorithm coded in Thrust. I'd like to know the impact of the number of threads per block in the performance of my algorithm. Is it possible to restrict thrust so that it does not use more than X number of threads per block? 回答1: Thrust doesn't expose any ability to either directly set the number of threads per block or the number of blocks used in a particular kernel call. These things are indirectly determined by algorithm and problem size, but you

Launch Configuration in Thrust

浪尽此生 提交于 2021-01-20 09:29:48
问题 I am trying to run some experiments on an algorithm coded in Thrust. I'd like to know the impact of the number of threads per block in the performance of my algorithm. Is it possible to restrict thrust so that it does not use more than X number of threads per block? 回答1: Thrust doesn't expose any ability to either directly set the number of threads per block or the number of blocks used in a particular kernel call. These things are indirectly determined by algorithm and problem size, but you

Thrust scan of just one class member

可紊 提交于 2021-01-05 12:00:07
问题 I have a custom class myClass which has members weight and config . I'd like to run an inclusive scan on a bunch of myClass es, but only on the weight s. Basically what I want is to take: [ {configA, weightA}, {configB, weightB}, {configC, weightC}, ...] to: [ {configA, weightA}, {configB, weight A + weightB}, {configC, weight A + weight B + weightC}, ...] Is there a simple way to do this using Thrust's fancy iterators? Since the binaryOp is required to be associative, I don't see how to do

Converting Thrust device iterators to raw pointers

帅比萌擦擦* 提交于 2020-12-25 04:34:51
问题 I'm considering the following simple code in which I'm converting thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin(); and thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin(); to raw pointers. To this end, I'm passing &(h_temp_iterator[0]) and &(d_temp_iterator[0]) to a function and a kernel, respectively. The former (CPU case) compiles, the latter (GPU case) not. The two cases should be in principle symmetric, so I do not understand the reason for the error

Converting Thrust device iterators to raw pointers

不想你离开。 提交于 2020-12-25 04:34:26
问题 I'm considering the following simple code in which I'm converting thrust::host_vector<int>::iterator h_temp_iterator = h_temp.begin(); and thrust::device_vector<int>::iterator d_temp_iterator = d_temp.begin(); to raw pointers. To this end, I'm passing &(h_temp_iterator[0]) and &(d_temp_iterator[0]) to a function and a kernel, respectively. The former (CPU case) compiles, the latter (GPU case) not. The two cases should be in principle symmetric, so I do not understand the reason for the error