thrust

function as argument of thrust iterator CUDA

北城余情 提交于 2019-12-08 09:25:25
问题 I am trying to implement ODEs solver routines running on GPUs using CUDA::Thurst iterators to solve a bunch of equations in the GPU, going to the details, here is a small piece of code: #include <thrust/device_vector.h> #include <thrust/transform.h> #include <thrust/sequence.h> #include <thrust/copy.h> #include <thrust/fill.h> #include <thrust/replace.h> #include <thrust/functional.h> #include <thrust/for_each.h> #include <thrust/device_vector.h> #include <thrust/iterator/zip_iterator.h>

Computing all-pairs distances between points in different sets with CUDA

半腔热情 提交于 2019-12-08 08:15:54
问题 I am trying to implement a brute force distance computation algorithm in CUDA. #define VECTOR_DIM 128 thrust::device_vector<float> feature_data_1; feature_data_1.resize(VECTOR_DIM * 1000); // 1000 128 dimensional points thrust::device_vector<float> feature_data_2; feature_data_2.resize(VECTOR_DIM * 2000); // 2000 128 dimensional points Now what I would like to do is to compute the L2 distances (sum of the squared differences) from every vector in the first matrix to every vector in the second

vectorized upper bound for segmented data in CUDA / thrust

梦想与她 提交于 2019-12-08 04:33:47
问题 I have the following input data: e = 0 0 0 0 0 0 | 1 1 1 t = 1 1 4 4 4 5 | 1 6 7 i = 0 1 2 3 4 5 | 6 7 8 // indices from [0,n-1] The data is first sorted by e , then by t . e is the key which identifies segments in the data. In this case: segment_0 = [0,5] segment_1 = [6,8] Each segment is again segmented by t . In this case: sub_segment_0_0 = [0,1] // t==1 sub_segment_0_1 = [2,4] // t==4 sub_segment_0_2 = [5,5] // t==5 sub_segment_1_0 = [6,6] // t==1 sub_segment_1_1 = [7,7] // t==6 sub

How to partly sort arrays on CUDA?

让人想犯罪 __ 提交于 2019-12-08 03:40:15
问题 Problem Provided I have two arrays: const int N = 1000000; float A[N]; myStruct *B[N]; The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2} ), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values) (e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1} ) on the GPU? The array B should be changed according to A (A is keys, B is values). Since the scale of A,B can be very large, I think the sort algorithm should be

convert CUDA device interleaved array to tuple for vector operations

随声附和 提交于 2019-12-08 03:24:57
问题 How do I convert a device array that contains interleaved floats to a CUDA thrust tuple for thrust vector operations. Purpose : I generate a crude list of vertices using Marching Cubes on CUDA. The output is a list of vertices, with redundancy and no connectivity. I wish to get a list of unique vertices and then an index buffer to these unique vertices, so I can perform some operations such as mesh simplification, etc... float *devPtr; //this is device pointer that holds an array of floats /

thrust::device_vector error

天涯浪子 提交于 2019-12-08 02:07:39
问题 I'm new to Thrust. I'm trying to copy from a thrust::host_vector to a thrust::device_vector , both of type Sequence which is a class I already implemented. I do however get an error "Invalid Device Function". I'm using CUDA 4.0 VS2010 on a GeForce GT 540. thrust::host_vector <Sequence> Ind_Tabel_V; void Ind_Table_Filling() { //some Code Sequence s; // some code Ind_Tabel_V.push_back(s); try { thrust::device_vector<Sequence> d_vec=Ind_Tabel_V; } catch (thrust::system_error &e) { std::cerr <<

Understanding Thrust (CUDA) memory usage

别来无恙 提交于 2019-12-07 17:43:36
问题 I 'm using the cuda/thrust library to do some Monte Carlo simulations. This works very well up to a certain number of simulations where I get a bad_alloc exception. This seems alright because an increasing number of simulations in my code means handling increasingly large device_vectors. So I expect this kind of exception to show up at some point. What I'd like to do now is to set an upper limit on this number of simulations based on the available memory on my GPU. Then, I could split the

CUDA Thrust library: How can I create a host_vector of host_vectors of integers?

限于喜欢 提交于 2019-12-07 16:49:55
问题 In C++ in order to create a vector that has 10 vectors of integers I would do the following: std::vector< std::vector<int> > test(10); Since I thought Thrust was using the same logic with the STL I tried doing the same: thrust::host_vector< thrust::host_vector<int> > test(10); However I got too many confusing errors. I tried doing: thrust::host_vector< thrust::host_vector<int> > test; and it worked, however I can't add anything to this vector. Doing thrust::host_vector<int> temp(3); test.push

Multi GPU usage with CUDA Thrust

。_饼干妹妹 提交于 2019-12-07 15:20:44
问题 I want to use my two graphic cards for calculation with CUDA Thrust. I have two graphic cards. Running on single cards works well for both cards, even when I store two device_vectors in the std::vector. If I use both cards at the same time, the first cycle in the loop works and causes no error. After the first run it causes an error, probably because the device pointer is not valid. I am not sure what the exact problem is, or how to use both cards for calculation. Minimal code sample: std:

Fastest way to access device vector elements directly on host

瘦欲@ 提交于 2019-12-07 08:42:48
问题 I refer you to following page http://code.google.com/p/thrust/wiki/QuickStartGuide#Vectors. Please see second paragraph where it says that Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We'll look at some more efficient techniques later. I searched all over the document but I could not find the more efficient technique. Does anyone