thrust

nVidia Thrust: device_ptr Const-Correctness

依然范特西╮ 提交于 2019-12-12 12:14:49
问题 In my project which makes extensive use of nVidia CUDA, I sometimes use Thrust for things that it does very, very well. Reduce is one algorithm that is particularly well implemented in that library and one use of reduce is to normalise a vector of non-negative elements by dividing each element by the sum of all elements. template <typename T> void normalise(T const* const d_input, const unsigned int size, T* d_output) { const thrust::device_ptr<T> X = thrust::device_pointer_cast(const_cast<T*

CUDA Thrust sort_by_key when the key is a tuple dealt with by zip_iterator's with custom comparison predicate

六月ゝ 毕业季﹏ 提交于 2019-12-12 11:28:49
问题 I've looked through a lot of questions here for something similar and there are quite a few, albeit with one minor change. I'm trying to sort values with a zip_iterator as a compound key. Specifically, I have the following function: void thrustSort( unsigned int * primaryKey, float * secondaryKey, unsigned int * values, unsigned int numberOfPoints) { thrust::device_ptr dev_ptr_pkey = thrust::device_pointer_cast(primaryKey); thrust::device_ptr dev_ptr_skey = thrust::device_pointer_cast

Multiple GPUs with Cuda Thrust?

依然范特西╮ 提交于 2019-12-12 09:47:21
问题 How do I use Thrust with multiple GPUs? Is it simply a matter of using cudaSetDevice(deviceId) and then running the relevant Thrust code? 回答1: With CUDA 4.0 or later, cudaSetDevice(deviceId) followed by your thrust code should work. Just keep in mind that you will need to create and operate on separate vectors on each device (unless you have devices that support peer-to-peer memory access and PCI-express bandwidth is sufficient for your task). 来源: https://stackoverflow.com/questions/8289860

thrust::copy doesn't work for device_vectors [duplicate]

☆樱花仙子☆ 提交于 2019-12-12 05:38:57
问题 This question already has an answer here : cuda thrust::remove_if throws “thrust::system::system_error” for device_vector? (1 answer) Closed 3 years ago . I copied this code from the Thrust documentation: #include <thrust/copy.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> int main() { thrust::device_vector<int> vec0(100); thrust::device_vector<int> vec1(100); thrust::copy(vec0.begin(), vec0.end(), vec1.begin()); return 0; } When I run this in Debug mode (VS2012), my

What is the optimal way to use additional data fields in functors in Thrust?

烈酒焚心 提交于 2019-12-12 02:57:27
问题 What is the proper (or optimal) way to use some constant data in functors used in thrust algorithms like thrust::transform ? The naive way I used was simply allocate required arrays inside the functor's operator() method, like this: struct my_functor { __host__ __device__ float operator()(thrust::tuple<float, float> args) { float A[2][10] = { { 4.0, 1.0, 8.0, 6.0, 3.0, 2.0, 5.0, 8.0, 6.0, 7.0 }, { 4.0, 1.0, 8.0, 6.0, 7.0, 9.0, 5.0, 1.0, 2.0, 3.6 }}; float x1 = thrust::get<0>(args); float x2 =

Thrust error with CUDA separate compilation

寵の児 提交于 2019-12-12 00:35:38
问题 I'm running into an error when I try to compile CUDA with relocatable device code enabled (-rdc = true). I'm using Visual Studio 2013 as compiler with CUDA 7.5. Below is a small example that shows the error. To clarify, the code below runs fine when -rdc = false, but when set to true, the error shows up. The error simply says: CUDA error 11 [\cuda\detail\cub\device\dispatch/device_radix_sort_dispatch.cuh, 687]: invalid argument Then I found this, which says: When invoked with primitive data

functor with nested calls to CUDA::thrust functors operating as zip_iterator

徘徊边缘 提交于 2019-12-11 22:44:41
问题 I found some difficulties trying to implement ODEs solver routines running on GPUs using CUDA::Thurst iterators to solve a bunch of coupled first order equations in the GPU. I want to work around the approach in the former question enabling the user to write the systems of equations as human like as possible using arbitrary functors working on tuples of vectors. In details, here is a small piece of code: #include <thrust/device_vector.h> #include <thrust/transform.h> #include <thrust/sequence

Checking if a matrix contains nans or infinite values in CUDA

天涯浪子 提交于 2019-12-11 17:26:34
问题 What is an efficient way to check a large matrix for inf / nan elements in CUDA (C++)? The matrix is stored as float* in the GPU memory. I don't need the location of those elements, just a boolean yes/no answer if at least one bad entry is present. The options are: have one kernel check the whole array (easy to implement but probably slow) have multiple kernels check e.g. the rows and combine the output with OR (are there any CUDA builtins for doing this efficiently?) ..other ideas? Thanks!

cuda9 + thrust sort_by_key overlayed with H2D copy (using streams)

五迷三道 提交于 2019-12-11 16:42:59
问题 I would like to overlap a thrust::sort_by_key operation with a host-to-device copy. Despite taking a cudaStream_t as an argument, my experiments seem to show that thrust::sort_by_key is a blocking operation. Below I attach a full code example in which first I measure the time to copy the data (from pinned memory), then I measure the time to do the sort_by_key. Finally, I try to overlap the two operations. I would expect to the see the copy time hidden by the sort_by_key operation. Instead, I

Thrust exception: “thrust::system::system_error at memory location 0x00000000”

别等时光非礼了梦想. 提交于 2019-12-11 13:59:29
问题 I wrote this code of a CUDA Kernel assign() using the class device_vector for initializing a vector. This kernel is launched by a class member function as a solution to the question: CUDA kernel as member function of a class and according to https://devtalk.nvidia.com/default/topic/573289/mixing-c-and-cuda/. I'm using a GTX650Ti GPU, Windows 8.1, Visual Studio 2013 Community and CUDA Toolkit 7.5. The code initTest.cu does compile but an exception is thrown making reference to the file trivial