thrust | 易学教程

Sorting packed vertices with thrust

阅读更多关于 Sorting packed vertices with thrust

So I have an device array of PackedVertex structs: struct PackedVertex { glm::vec3 Vertex; glm::vec2 UV; glm::vec3 Normal; } I'm trying to sort them so that duplicates are clustered together in the array; I don't care about overall order at all. I've tried sorting them by comparing the lengths of the vectors which ran but didn't sort them correctly so now I'm trying per variable using 3 stable_sorts with the binary_operators: __thrust_hd_warning_disable__ struct sort_packed_verts_by_vertex : public thrust::binary_function < PackedVertex, PackedVertex, bool > { __host__ __device__ bool operator

Detecting ptx kernel of Thrust transform

阅读更多关于 Detecting ptx kernel of Thrust transform

I have following thrust::transform call. my_functor *f_1 = new my_functor(); thrust::transform(data.begin(), data.end(), data.begin(),*f_1); I want to detect it's corresponding kernel in PTX file. But there are many kernels containing my_functor in their mangled names. For example- _ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for_each_n_closureINS_12zip_iteratorINS_5tupleINS_6detail15normal_iteratorINS_10device_ptrIiEEEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEEjNS9_30device_unary_transform_functorI10my_functorEENS3_20blocked_thread_arrayEEEEEvT_

Differences between VexCL, Thrust, and Boost.Compute

阅读更多关于 Differences between VexCL, Thrust, and Boost.Compute

问题 With a just a cursory understanding of these libraries, they look to be very similar. I know that VexCL and Boost.Compute use OpenCl as a backend (although the v1.0 release VexCL also supports CUDA as a backend) and Thrust uses CUDA. Aside from the different backends, what's the difference between these. Specifically, what problem space do they address and why would I want to use one over the other. Also, on the Thrust FAQ it is stated that The primary barrier to OpenCL support is the lack of

cuda thrust::remove_if throws “thrust::system::system_error” for device_vector?

阅读更多关于 cuda thrust::remove_if throws “thrust::system::system_error” for device_vector?

I am currently using CUDA 7.5 under VS 2013. Today I needed to remove some of the elements from a device_vector , thus decided to use remove_if . But however I modify the code, the program just compiles well but throws "thrust::system::system_error" at run time. Firstly I tried my own code: int main() { thrust::host_vector<int> AA(10, 1); thrust::sequence(AA.begin(), AA.end()); thrust::host_vector<bool> SS(10,false); thrust::fill(SS.begin(), SS.begin() + 5, true); thrust::device_vector<int> devAA=AA; thrust::device_vector<bool> devSS = SS; thrust::device_vector<int>::iterator new_end = thrust:

cuda-gdb crashes with thrust (CUDA release 5.5)

阅读更多关于 cuda-gdb crashes with thrust (CUDA release 5.5)

I have the following trivial thrust::gather program (taken directly from the thrust::gather documentation) #include <thrust/gather.h> #include <thrust/device_vector.h> int main(void) { // mark even indices with a 1; odd indices with a 0 int values[10] = {1, 0, 1, 0, 1, 0, 1, 0, 1, 0}; thrust::device_vector<int> d_values(values, values + 10); // gather all even indices into the first half of the range // and odd indices to the last half of the range int map[10] = {0, 2, 4, 6, 8, 1, 3, 5, 7, 9}; thrust::device_vector<int> d_map(map, map + 10); thrust::device_vector<int> d_output(10); thrust:

thrust::max_element slow in comparison cublasIsamax - More efficient implementation?

阅读更多关于 thrust::max_element slow in comparison cublasIsamax - More efficient implementation?

I need a fast and efficient implementation for finding the index of the maximum value in an array in CUDA. This operation needs to be performed several times. I originally used cublasIsamax for this, however, it sadly returns the index of the maximum absolute value, which is not what I want. Instead, I'm using thrust::max_element, however the speed is rather slow in comparison to cublasIsamax. I use it in the following manner: //d_vector is a pointer on the device pointing to the beginning of the vector, containing nrElements floats. thrust::device_ptr<float> d_ptr = thrust::device_pointer

Sorting Pixels from opengl using CUDA and Thrust (Windows Port Issues…)

阅读更多关于 Sorting Pixels from opengl using CUDA and Thrust (Windows Port Issues…)

问题 I tried to port this example to WINDOWS with GLFW, since I don't have access to Linux box .. but the only thing I get is the clear color and nothing comes up .. Did others get this example to work / Did I miss something here? I do not even get the original image, before the sort either ... #include <stdio.h> #include <stdlib.h> #include <string.h> #include <thrust/device_vector.h> #include <thrust/sort.h> #include <GL/glew.h> #include <GL/glfw.h> #include <cuda_gl_interop.h> const int WIDTH

Segmentation error when using thrust::sort in CUDA

阅读更多关于 Segmentation error when using thrust::sort in CUDA

问题 I am trying to sort an array of class objects based on its type by passing a comparison function as the parameter to the thrust sort. The class defination : class TetraCutInfo { public: int tetraid; unsigned int ncutEdges; unsigned int ncutNodes; unsigned int type_cut; __host__ __device__ TetraCutInfo(); }; Sort: thrust::sort(cutInfoptr,cutInfoptr+n,cmp()); cutInfoptr is a pointer of type TetraCutInfo having the address of the device memory allocated using cudaMalloc. Comparison function

Counting occurrences of numbers in a CUDA array

阅读更多关于 Counting occurrences of numbers in a CUDA array

I have an array of unsigned integers stored on the GPU with CUDA (typically 1000000 elements). I would like to count the occurrence of every number in the array. There are only a few distinct numbers (about 10 ), but these numbers can span from 1 to 1000000 . About 9/10 th of the numbers are 0 , I don't need the count of them. The result looks something like this: 58458 -> 1000 occurrences 15 -> 412 occurrences I have an implementation using atomicAdd s, but it is too slow (a lot of threads write to the same address). Does someone know of a fast/efficient method? You can implement a histogram

how to cast thrust::device_vector<int> to raw pointer

阅读更多关于 how to cast thrust::device_vector to raw pointer

I have a thrust device_vector . I want to cast it to a raw pointer so that I can pass it to a kernel. How can I do so? thrust::device_vector<int> dv(10); //CAST TO RAW kernel<<<bl,tpb>>>(pass raw) You can do this using thrust::raw_pointer_cast . The device vector class has a member function data which will return a thrust::device_ptr to the memory held by the vector, which can be cast, something like this: thrust::device_vector<int> dv(10); int * dv_ptr = thrust::raw_pointer_cast(dv.data()); kernel<<<bl,tpb>>>(dv_ptr) (disclaimer: written in browser, never compiled, never tested). There is a