thrust

How to use Thrust to sort the rows of a matrix?

 ̄綄美尐妖づ 提交于 2019-12-27 11:46:07
问题 I have a 5000x500 matrix and I want to sort each row separately with cuda. I can use arrayfire but this is just a for loop over the thrust::sort, which should not be efficient. https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cuda/kernel/sort.hpp for(dim_type w = 0; w < val.dims[3]; w++) { dim_type valW = w * val.strides[3]; for(dim_type z = 0; z < val.dims[2]; z++) { dim_type valWZ = valW + z * val.strides[2]; for(dim_type y = 0; y < val.dims[1]; y++) { dim_type valOffset =

thrust set operations not compiling [duplicate]

丶灬走出姿态 提交于 2019-12-25 00:27:14
问题 This question already has an answer here : thrust set difference fails to compile with calling a __host__ function from a __host__ __device__ function is not allowed (1 answer) Closed last year . I tried a simple program using thrust::set. It finds the difference of two sets. However I get compilation error. #include <thrust/set_operations.h> #include <thrust/execution_policy.h> #include <thrust/device_vector.h> int main() { thrust::device_vector<int> A1(7); thrust::device_vector<int> A2(5);

How to sort with less precision on keys with Thrust library

删除回忆录丶 提交于 2019-12-24 13:03:07
问题 I have a set of integer values and I want to sort them using Thrust. Is there a possiblity for using only some high bits/low bits in this sorting. If possible I do not want to use user defined comparator, because it changes the used algorithm from radix-sort to merge-sort and increases elapsed time quite much. I think when all the numbers have the same value on a bit, the bit is skipped while sorting, so is it feasible to use the lowest possible bit number and hope it will be sufficient. (ie:

Thrust reduction and overloaded operator-(const float3&, const float3&) won't compile

心已入冬 提交于 2019-12-24 10:36:47
问题 I overload operators to have a vector space over float3 (and similar structs) in vectorspace.cuh : // Boilerplate vector space over data type Pt #pragma once #include <type_traits> // float3 __device__ __host__ float3 operator+=(float3& a, const float3& b) { a.x += b.x; a.y += b.y; a.z += b.z; return a; } __device__ __host__ float3 operator*=(float3& a, const float b) { a.x *= b; a.y *= b; a.z *= b; return a; } // float4 __device__ __host__ float4 operator+=(float4& a, const float4& b) { a.x

Thrust inside user written kernels

狂风中的少年 提交于 2019-12-24 06:41:14
问题 I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code. I would like to know if I can pass a device_vector to my own kernel? How? If yes, what are the operations permitted on it inside kernel/device code? 回答1: As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo >

simple sorting using thrust not working

血红的双手。 提交于 2019-12-24 05:44:24
问题 I have a cuda thrust program as #include <stdio.h> #include<iostream> #include <cuda.h> #include <thrust/sort.h> // main routine that executes on the host int main(void) { int *a_h, *a_d; // Pointer to host & device arrays const int N = 10; // Number of elements in arrays size_t size = N * sizeof(int); a_h = (int *)malloc(size); // Allocate array on host cudaMalloc((void **) &a_d, size);// Allocate array on device std::cout<<"enter the 10 numbers"; // Initialize host array and copy it to CUDA

Thrust filter by key value

心不动则不痛 提交于 2019-12-24 05:38:10
问题 In my application I have a class like this: class sample{ thrust::device_vector<int> edge_ID; thrust::device_vector<float> weight; thrust::device_vector<int> layer_ID; /*functions, zip_iterators etc. */ }; At a given index every vector stores the corresponding data of the same edge. I want to write a function that filters out all the edges of a given layer, something like this: void filter(const sample& src, sample& dest, const int& target_layer){ for(...){ if( src.layer_ID[x] == target_layer

CUDA Thrust - Counting matching subarrays

亡梦爱人 提交于 2019-12-23 03:35:15
问题 I'm trying to figure out if it's possible to efficiently calculate the conditional entropy of a set of numbers using CUDA. You can calculate the conditional entropy by dividing an array into windows, then counting the number of matching subarrays/substrings for different lengths. For each subarray length, you calculate the entropy by adding together the matching subarray counts times the log of those counts. Then, whatever you get as the minimum entropy is the conditional entropy. To give a

fp16 support in cuda thrust

纵饮孤独 提交于 2019-12-23 03:11:30
问题 I am not able to found anything about the fp16 support in thrust cuda template library. Even the roadmap page has nothing about it: https://github.com/thrust/thrust/wiki/Roadmap But I assume somebody has probably figured out how to overcome this problem, since the fp16 support in cuda is around for more than 6 month. As of today, I heavily rely on thrust in my code, and templated nearly every class I use in order to ease fp16 integration, unfortunately, absolutely nothing works out of the box

How to force a functor to see an entire thrust::vector so that sorting is possible?

好久不见. 提交于 2019-12-23 02:46:13
问题 I'm new to CUDA and having a little trouble with functors. I am trying to input a thrust::vector of thrust::vectors into a functor. Currently I can enter a vector and do something to each element and return the modified vector using thrust::for_each, but if I were to want to sort a vector in a functor I would need to be able to input the whole vector at once so the functor can act on it as a whole. Is there a way to do this? The code below compiles, but does not return the vector sorted.