thrust | 易学教程

How to use Thrust to sort the rows of a matrix?

阅读更多关于 How to use Thrust to sort the rows of a matrix?

问题 I have a 5000x500 matrix and I want to sort each row separately with cuda. I can use arrayfire but this is just a for loop over the thrust::sort, which should not be efficient. https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cuda/kernel/sort.hpp for(dim_type w = 0; w < val.dims[3]; w++) { dim_type valW = w * val.strides[3]; for(dim_type z = 0; z < val.dims[2]; z++) { dim_type valWZ = valW + z * val.strides[2]; for(dim_type y = 0; y < val.dims[1]; y++) { dim_type valOffset =

thrust set operations not compiling [duplicate]

阅读更多关于 thrust set operations not compiling [duplicate]

问题 This question already has an answer here : thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed (1 answer) Closed last year . I tried a simple program using thrust::set. It finds the difference of two sets. However I get compilation error. #include <thrust/set_operations.h> #include <thrust/execution_policy.h> #include <thrust/device_vector.h> int main() { thrust::device_vector<int> A1(7); thrust::device_vector<int> A2(5);

How to sort with less precision on keys with Thrust library

阅读更多关于 How to sort with less precision on keys with Thrust library

问题 I have a set of integer values and I want to sort them using Thrust. Is there a possiblity for using only some high bits/low bits in this sorting. If possible I do not want to use user defined comparator, because it changes the used algorithm from radix-sort to merge-sort and increases elapsed time quite much. I think when all the numbers have the same value on a bit, the bit is skipped while sorting, so is it feasible to use the lowest possible bit number and hope it will be sufficient. (ie:

Thrust reduction and overloaded operator-(const float3&, const float3&) won't compile

阅读更多关于 Thrust reduction and overloaded operator-(const float3&, const float3&) won't compile

问题 I overload operators to have a vector space over float3 (and similar structs) in vectorspace.cuh : // Boilerplate vector space over data type Pt #pragma once #include <type_traits> // float3 __device__ __host__ float3 operator+=(float3& a, const float3& b) { a.x += b.x; a.y += b.y; a.z += b.z; return a; } __device__ __host__ float3 operator*=(float3& a, const float b) { a.x *= b; a.y *= b; a.z *= b; return a; } // float4 __device__ __host__ float4 operator+=(float4& a, const float4& b) { a.x

Thrust inside user written kernels

阅读更多关于 Thrust inside user written kernels

问题 I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code. I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code? 回答1: As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo >

simple sorting using thrust not working

阅读更多关于 simple sorting using thrust not working

问题 I have a cuda thrust program as #include <stdio.h> #include<iostream> #include <cuda.h> #include <thrust/sort.h> // main routine that executes on the host int main(void) { int *a_h, *a_d; // Pointer to host & device arrays const int N = 10; // Number of elements in arrays size_t size = N * sizeof(int); a_h = (int *)malloc(size); // Allocate array on host cudaMalloc((void **) &a_d, size);// Allocate array on device std::cout<<"enter the 10 numbers"; // Initialize host array and copy it to CUDA

Thrust filter by key value

阅读更多关于 Thrust filter by key value

问题 In my application I have a class like this: class sample{ thrust::device_vector<int> edge_ID; thrust::device_vector<float> weight; thrust::device_vector<int> layer_ID; /*functions, zip_iterators etc. */ }; At a given index every vector stores the corresponding data of the same edge. I want to write a function that filters out all the edges of a given layer, something like this: void filter(const sample& src, sample& dest, const int& target_layer){ for(...){ if( src.layer_ID[x] == target_layer

CUDA Thrust - Counting matching subarrays

阅读更多关于 CUDA Thrust - Counting matching subarrays

问题 I'm trying to figure out if it's possible to efficiently calculate the conditional entropy of a set of numbers using CUDA. You can calculate the conditional entropy by dividing an array into windows, then counting the number of matching subarrays/substrings for different lengths. For each subarray length, you calculate the entropy by adding together the matching subarray counts times the log of those counts. Then, whatever you get as the minimum entropy is the conditional entropy. To give a

fp16 support in cuda thrust

阅读更多关于 fp16 support in cuda thrust

问题 I am not able to found anything about the fp16 support in thrust cuda template library. Even the roadmap page has nothing about it: https://github.com/thrust/thrust/wiki/Roadmap But I assume somebody has probably figured out how to overcome this problem, since the fp16 support in cuda is around for more than 6 month. As of today, I heavily rely on thrust in my code, and templated nearly every class I use in order to ease fp16 integration, unfortunately, absolutely nothing works out of the box

How to force a functor to see an entire thrust::vector so that sorting is possible?

阅读更多关于 How to force a functor to see an entire thrust::vector so that sorting is possible?

问题 I'm new to CUDA and having a little trouble with functors. I am trying to input a thrust::vector of thrust::vectors into a functor. Currently I can enter a vector and do something to each element and return the modified vector using thrust::for_each, but if I were to want to sort a vector in a functor I would need to be able to input the whole vector at once so the functor can act on it as a whole. Is there a way to do this? The code below compiles, but does not return the vector sorted.