thrust | 易学教程

How to sort two arrays/vectors in respect to values in one of the arrays, using CUDA/Thrust

阅读更多关于 How to sort two arrays/vectors in respect to values in one of the arrays, using CUDA/Thrust

问题 This is a conceptual question in regards programming. To summarize, I have two arrays/vectors and I need to sort one with the changes propagating in the other as well, so that if I sort arrayOne, for each swap in the sort - the same thing happens to arrayTwo. Now, I know that std::sort allows you to define a comparison function (for custom objects I assume) and I was thinking of defining one to swap arrayTwo at the same time. So what I want is - to sort the two vectors based on values in one

CUDA thrust zip_iterator tuple transform_reduce

阅读更多关于 CUDA thrust zip_iterator tuple transform_reduce

问题 I want to compute for vectors and , where denotes the magnitude of the vector . Since this involves taking the square root of the sum of the squares of the differences between each corresponding component of the two vectors, it should be a highly parallelizable task. I am using Cuda and Thrust, through Cygwin, on Windows 10. Both Cuda and Thrust are in general working. The below code compiles and runs (with nvcc), but only because I have commented out three lines toward the bottom of main ,

Poor performance when calling cudaMalloc with 2 GPUs simultaneously

阅读更多关于 Poor performance when calling cudaMalloc with 2 GPUs simultaneously

问题 I have an application where I split the processing load among the GPUs on a user's system. Basically, there is CPU thread per GPU that initiates a GPU processing interval when triggered periodically by the main application thread. Consider the following image (generated using NVIDIA's CUDA profiler tool) for an example of a GPU processing interval -- here the application is using a single GPU. As you can see a big portion of the GPU processing time is consumed by the two sorting operations

How can I find row to all rows distance matrix between two matrices W and X in Thrust or Cublas?

阅读更多关于 How can I find row to all rows distance matrix between two matrices W and X in Thrust or Cublas?

问题 I have following matlab code; tempx = full(sum(X.^2, 2)); tempc = full(sum(C.^2, 2).'); D = -2*(X * C.'); D = bsxfun(@plus, D, tempx); D = bsxfun(@plus, D, tempc); where X is nxm and W is kxm matrices realtively. One is the data and the other is the weight matrix. I find the distance matrix D with the given code. I am watching an efficient Cublas or Thrust implementation of this operations. I succeeded the line D = -2*(X * C.'); by cublas but the residual part is still a question as a newbie?

thrust: fill isolate space

阅读更多关于 thrust: fill isolate space

问题 I have an array like this: 0 0 0 1 0 0 0 0 5 0 0 3 0 0 0 8 0 0 I want every non-zero elements to expand themselves one element at a time until it reaches other non-zero elements, the result is like this: 1 1 1 1 1 1 5 5 5 5 3 3 3 3 8 8 8 8 Is there any way to do this using thrust? 回答1: Is there any way to do this using thrust? Yes, here is one possible approach. For each position in the sequence, compute 2 distances. The first is the distance to the nearest non-zero value in the left

function as argument of thrust iterator CUDA

阅读更多关于 function as argument of thrust iterator CUDA

I am trying to implement ODEs solver routines running on GPUs using CUDA::Thurst iterators to solve a bunch of equations in the GPU, going to the details, here is a small piece of code: #include <thrust/device_vector.h> #include <thrust/transform.h> #include <thrust/sequence.h> #include <thrust/copy.h> #include <thrust/fill.h> #include <thrust/replace.h> #include <thrust/functional.h> #include <thrust/for_each.h> #include <thrust/device_vector.h> #include <thrust/iterator/zip_iterator.h> #include <iostream> #include <math.h> __host__ __device__ float f(float x, float y) { return cos(y)*sin(x);

Segmented Sort with CUDPP/Thrust

阅读更多关于 Segmented Sort with CUDPP/Thrust

问题 Is it possible to do segmented sort in with CUDPP in CUDA? By segmented sort, I mean to sort elements of array which are protected by flags like below. A[10,9,8,7,6,5,4,3,2,1] Flag array[1,0,1,0,0,1,0,0,0,0] Sort elements of A which are between consecutive 1. Expected output [9,10,6,7,8,1,2,3,4,5] 回答1: you can do this in a single sorting pass: the idea is to adjust the elements in your array such that sort will relocate elements only within the "segments" for your example: A[10,9,8,7,6,5,4,3

Thrust: How to directly control where an algorithm invocation executes?

阅读更多关于 Thrust: How to directly control where an algorithm invocation executes?

问题 The following code has no information that may lead it to run at CPU or GPU. I wonder where is the "reduce" operation executed? #include <thrust/iterator/counting_iterator.h> ... // create iterators thrust::counting_iterator<int> first(10); thrust::counting_iterator<int> last = first + 3; first[0] // returns 10 first[1] // returns 11 first[100] // returns 110 // sum of [first, last) thrust::reduce(first, last); // returns 33 (i.e. 10 + 11 + 12) Furthermore, thrust::transform_reduce( thrust:

thrust::device_vector error

阅读更多关于 thrust::device_vector error

I'm new to Thrust. I'm trying to copy from a thrust::host_vector to a thrust::device_vector , both of type Sequence which is a class I already implemented. I do however get an error "Invalid Device Function". I'm using CUDA 4.0 VS2010 on a GeForce GT 540. thrust::host_vector <Sequence> Ind_Tabel_V; void Ind_Table_Filling() { //some Code Sequence s; // some code Ind_Tabel_V.push_back(s); try { thrust::device_vector<Sequence> d_vec=Ind_Tabel_V; } catch (thrust::system_error &e) { std::cerr << "Error accessing vector element: " << e.what() << std::endl; } } Can anyone help please? That error

Understanding Thrust (CUDA) memory usage

阅读更多关于 Understanding Thrust (CUDA) memory usage

I 'm using the cuda/thrust library to do some Monte Carlo simulations. This works very well up to a certain number of simulations where I get a bad_alloc exception. This seems alright because an increasing number of simulations in my code means handling increasingly large device_vectors. So I expect this kind of exception to show up at some point. What I'd like to do now is to set an upper limit on this number of simulations based on the available memory on my GPU. Then, I could split the workload in bunches of simulations. So I've been trying to size my problem before launching my set of