c++-amp

Improving memory layout for parallel computing

断了今生、忘了曾经 提交于 2020-01-29 09:45:08
问题 I'm trying to optimize an algorithm (Lattice Boltzmann) for parallel computing using C++ AMP. And looking for some suggestions to optimize the memory layout, just found out that removing one parameter from the structure into another vector (the blocked vector) gave and increase of about 10%. Anyone got any tips that can further improve this, or something i should take into consideration? Below is the most time consuming function that is executed for each timestep, and the structure used for

C++AMP Computing gradient using texture on a 16 bit image

﹥>﹥吖頭↗ 提交于 2020-01-21 15:56:05
问题 I am working with depth images retrieved from kinect which are 16 bits. I found some difficulties on making my own filters due to the index or the size of the images. I am working with Textures because allows to work with any bit size of images. So, I am trying to compute an easy gradient to understand what is wrong or why it doesn't work as I expected. You can see that there is something wrong when I use y dir. For x: For y: That's my code: typedef concurrency::graphics::texture<unsigned int

Controlling the index variables in C++ AMP

孤街浪徒 提交于 2020-01-06 12:45:50
问题 I have just started trying C++ AMP and I decided to give it a shot with the current project I am working on. At some point, I have to build a distance matrix for the vectors I have and I have written the code below for this unsigned int samplesize=samplelist.size(); unsigned int vs = samplelist.front().size(); vector<double> samplevec(samplesize*vs); vector<double> distancevec(samplesize*samplesize,0); it1=samplelist.begin(); for(int i=0 ; i<samplesize; ++i){ for(int j = 0 ; j<vs ; ++j){

Reading Buffer Data using C++ AMP

扶醉桌前 提交于 2019-12-24 12:34:36
问题 I can't seem to find a way to let myself read the data from my AMP array. What I want to be able to do, is take my buffer, copy it into a vector and then use the vector. I'm aware that I should set the CPU access flags, but I'm having trouble doing so. Firstly, this is how I'm trying to access the buffer. I'm putting this here first just in case I have done something how it shouldn't be done. Perhaps there is a function built in that I've missed that does this for me? std::vector<Pticle>

Behavior of operator[] on 1D and 2D arrays in C++ AMP.

可紊 提交于 2019-12-24 02:07:15
问题 I've enocuntered a very strange exception when writting code in C++Amp. I define two concurrency::array objects as follows: concurrency::array<float, 2> img_amp_data(11, 11, image_data.begin()); concurrency::array<float> a_amp_result(121, empty_vec.begin()); When I want to access the elements of the first of them std::cout << img_amp_data[0][0] << std::endl; everything runs properly, but when I want to access the second one std::cout << a_amp_result[0] << std::endl; I get a following

Copy data from GPU to CPU

风格不统一 提交于 2019-12-22 08:24:38
问题 I am trying to calculate a matrix using C++ AMP. I use an array with width and height of 3000 x 3000 and I repeat the calculating procedure 20000 times: //_height=_width=3000 extent<2> ext(_height,_width); array<int, 2> GPU_main(ext,gpuDevice.default_view); array<int, 2> GPU_res(ext,gpuDevice.default_view); copy(_main, GPU_main); array_view<int,2> main(GPU_main); array_view<int,2> res(GPU_res); res.discard_data(); number=20000; for(int i=0;i<number;i++) { parallel_for_each(e,[=](index<2> idx

What is the current status of C++ AMP [closed]

核能气质少年 提交于 2019-12-20 09:29:20
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . I am working on high performance code in C++ and have been using both CUDA and OpenCL and more recently C++AMP, which I like very much. I am however a little worried that it is not being developed and extended and will die out. What leads me to this thought is that even the MS C+

will array_view.synchronize_asynch wait for parallel_for_each completion?

梦想的初衷 提交于 2019-12-20 04:41:28
问题 If I have a concurrency::array_view being operated on in a concurrency::parallel_for_each loop, my understanding is that I can continue other tasks on the CPU while the loop is executing: using namespace Concurrency; array_view<int> av; parallel_for_each(extent<1>(number),[=](index<1> idx) { // do some intense computations on av } // do some stuff on the CPU while we wait av.synchronize(); // wait for the parallel_for_each loop to finish and copy the data But what if I want to not wait for

will array_view.synchronize_asynch wait for parallel_for_each completion?

雨燕双飞 提交于 2019-12-20 04:41:19
问题 If I have a concurrency::array_view being operated on in a concurrency::parallel_for_each loop, my understanding is that I can continue other tasks on the CPU while the loop is executing: using namespace Concurrency; array_view<int> av; parallel_for_each(extent<1>(number),[=](index<1> idx) { // do some intense computations on av } // do some stuff on the CPU while we wait av.synchronize(); // wait for the parallel_for_each loop to finish and copy the data But what if I want to not wait for

C++Amp Copying an image of 16 bits from a texture to texture (from a openCV Mat)

Deadly 提交于 2019-12-13 13:41:22
问题 This issue is the next step from this one link. In brief, I am working with depth images from a kinect which retrieve images of 16 bits. With C++Amp we do have some restrictions on the bit size of the data. So, I am trying to use textures to deal with it. Now, I'm sure I am writing to the proper pixel. However, it seems there is some issue retrieving from my texture original data. That's the code: typedef concurrency::graphics::texture<unsigned int, 2> TextureData; typedef concurrency: