thrust

Why intersection of Thrust Library is returning unexpected result?

て烟熏妆下的殇ゞ 提交于 2019-12-01 10:22:29
问题 I'm using the library Thrust to get intersection set of a two larger sets of integers. In test with 2 small inputs i got correct results, but when i use two sets with 10^8 and 65535*1024 elements i got a empty set. Who's can explain this problem? Changing the two first variables to smaller values the thrust returns a expected intersection set. My code is following. #include <thrust/set_operations.h> #include <thrust/device_vector.h> #include <thrust/device_ptr.h> #include <iostream> #include

How to use CUB and Thrust in one CUDA code

风流意气都作罢 提交于 2019-12-01 09:45:12
问题 I'm trying to introduce some CUB into my "old" Thrust code, and so have started with a small example to compare thrust::reduce_by_key with cub::DeviceReduce::ReduceByKey , both applied to thrust::device_vectors . The thrust part of the code is fine, but the CUB part, which naively uses raw pointers obtained via thrust::raw_pointer_cast, crashes after the CUB calls. I put in a cudaDeviceSynchronize() to try to solve this problem, but it didn't help. The CUB part of the code was cribbed from

removing elements from an device_vector

為{幸葍}努か 提交于 2019-12-01 07:39:59
thrust::device_vector values thrust::device_vector keys; After initialization, keys contains some elements equal to -1. I wanted to delete the elements in keys and in the same position of values. But I do not know how to deal with it parallel? There are probably many ways to do this. One possible way: use the stencil version of thrust::remove_if ( documentation ), with the keys as your stencil, removing the elements in values where the corresponding key is -1. You will need to create a functor for the predicate test. use thrust::remove ( documentation ) on the keys to remove the values that

removing elements from an device_vector

天涯浪子 提交于 2019-12-01 05:24:43
问题 thrust::device_vector values thrust::device_vector keys; After initialization, keys contains some elements equal to -1. I wanted to delete the elements in keys and in the same position of values. But I do not know how to deal with it parallel? 回答1: There are probably many ways to do this. One possible way: use the stencil version of thrust::remove_if (documentation), with the keys as your stencil, removing the elements in values where the corresponding key is -1. You will need to create a

Thrust transform throws error: “bulk_kernel_by_value: an illegal memory access was encountered”

心不动则不痛 提交于 2019-12-01 01:36:51
I'm rather new to CUDA/Thrust and have a problem with a code snippet. To make it easier I have trimmed it down to the bare minimum. The code is the following: struct functor{ functor(float (*g)(const float&)) : _g{g} {} __host__ __device__ float operator()(const float& x) const { return _g(x); } private: float (*_g)(const float&); }; __host__ __device__ float g(const float& x){return 3*x;} int main(void){ thrust::device_vector<float> X(4,1); thrust::transform(X.begin(), X.end(), X.begin(), functor(&g)); } The idea is that I can pass any function to the functor, so I can apply that function to

Getting CUDA Thrust to use a CUDA stream of your choice

房东的猫 提交于 2019-11-30 16:10:08
Looking at kernel launches within the code of CUDA Thrust, it seems they always use the default stream. Can I make Thrust use a stream of my choice? Am I missing something in the API? JackOLantern I want to update the answer provided by talonmies following the release of Thrust 1.8 which introduces the possibility of indicating the CUDA execution stream as thrust::cuda::par.on(stream) see also Thrust Release 1.8.0 . In the following, I'm recasting the example in False dependency issue for the Fermi architecture in terms of CUDA Thrust APIs. #include <iostream> #include "cuda_runtime.h"

Pairwise operation on segmented data in CUDA/thrust

試著忘記壹切 提交于 2019-11-29 23:44:23
问题 Suppose I have a data array, an array containing keys referencing entries in the data array and a third array which contains an id for every key array entry e.g. DataType dataArray[5]; int keyArray[10] = {1, 2, 3, 1, 2, 2, 1, 1, 1, 1}; int ids[10] = {0, 0, 0, 1, 2, 2, 2, 3, 3, 3}; How can I execute a custom operator ResultDataType fun(int key1, int key2, int id) pairwise for each segment of ids ignoring the case key1 == key2 using thrust? In this example I'd like to execute and store the

Getting CUDA Thrust to use a CUDA stream of your choice

江枫思渺然 提交于 2019-11-29 23:06:23
问题 Looking at kernel launches within the code of CUDA Thrust, it seems they always use the default stream. Can I make Thrust use a stream of my choice? Am I missing something in the API? 回答1: I want to update the answer provided by talonmies following the release of Thrust 1.8 which introduces the possibility of indicating the CUDA execution stream as thrust::cuda::par.on(stream) see also Thrust Release 1.8.0. In the following, I'm recasting the example in False dependency issue for the Fermi

Differences between VexCL, Thrust, and Boost.Compute

感情迁移 提交于 2019-11-29 19:52:00
With a just a cursory understanding of these libraries, they look to be very similar. I know that VexCL and Boost.Compute use OpenCl as a backend (although the v1.0 release VexCL also supports CUDA as a backend) and Thrust uses CUDA. Aside from the different backends, what's the difference between these. Specifically, what problem space do they address and why would I want to use one over the other. Also, on the Thrust FAQ it is stated that The primary barrier to OpenCL support is the lack of an OpenCL compiler and runtime with support for C++ templates If this is the case, how is it possible

Segmentation error when using thrust::sort in CUDA

情到浓时终转凉″ 提交于 2019-11-29 18:14:17
I am trying to sort an array of class objects based on its type by passing a comparison function as the parameter to the thrust sort. The class defination : class TetraCutInfo { public: int tetraid; unsigned int ncutEdges; unsigned int ncutNodes; unsigned int type_cut; __host__ __device__ TetraCutInfo(); }; Sort: thrust::sort(cutInfoptr,cutInfoptr+n,cmp()); cutInfoptr is a pointer of type TetraCutInfo having the address of the device memory allocated using cudaMalloc. Comparison function struct cmp { __host__ __device__ bool operator()(const TetraCutInfo x, TetraCutInfo y) { return (x.type_cut