openacc

Speed of Pascal CUDA8 1080Ti unified memory

非 Y 不嫁゛ 提交于 2019-12-30 11:38:10
问题 Thanks to the answers here yesterday, I think I now have a correct basic test of unified memory using Pascal 1080Ti. It allocates a 50GB single dimension array and adds it up. If I understand correctly, it should be memory bound since this test is so simple (adding integers). However, it takes 24 seconds equating to about 2GB/s. When I run the CUDA8 bandwidthTest I see higher rates: 11.7GB/s pinned and 8.5GB/s pageable. Is there any way to get the test to run faster than 24 seconds? Here's

Using OpenACC to parallelize nested loops

五迷三道 提交于 2019-12-30 03:36:08
问题 I am very new to openacc and have just high-level knowledge so any help and explanation of what I am doing wrong would be appreciated. I am trying to accelerate(parallelize) a not so straightforward nested loop that updates a flattened (3D to 1D) array using openacc directives. I have posted a simplified sample code below that when compiled using pgcc -acc -Minfo=accel test.c gives the following error: call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

OpenACC must have routine information error

本秂侑毒 提交于 2019-12-25 09:25:10
问题 I am trying to parallelize a simple mandelbrot c program, yet I get this error that has to do with not including acc routine information. Also, I am not sure whether I should be copying data in and out of the parallel section. PS I am relatively new to parallel programming, so any advice with learning it would be appreciated. (Warning when compiled) PGC-S-0155-Procedures called in a compute region must have acc routine information: fwrite (mandelbrot.c: 88) PGC-S-0155-Accelerator region

OpenACC must have routine information error

本小妞迷上赌 提交于 2019-12-25 09:22:08
问题 I am trying to parallelize a simple mandelbrot c program, yet I get this error that has to do with not including acc routine information. Also, I am not sure whether I should be copying data in and out of the parallel section. PS I am relatively new to parallel programming, so any advice with learning it would be appreciated. (Warning when compiled) PGC-S-0155-Procedures called in a compute region must have acc routine information: fwrite (mandelbrot.c: 88) PGC-S-0155-Accelerator region

not able to fill the array allocated on the gpu

有些话、适合烂在心里 提交于 2019-12-24 19:43:02
问题 Please, help me. I have the following code ... #include <accelmath.h> #include <openacc.h> const long int G=100000; const unsigned int GL=100000; const long int K=G; const int LE=1.0f; struct Particle { float x; float rs; }; Particle particles[GL]; int sort[GL]; int ind01[GL]; long int MAX_ELEMENT=1; int POSITION1; int POSITION0; int LIFE=0; bool start=true; int mini; int count0; int count1; int GL1; int js; #pragma acc declare device_resident(ind01,POSITION0,POSITION1,mini,GL1,js,MAX_ELEMENT

how to solve pgcc&openacc linker error “__pgi_uacc_multicorestart”, “__pgi_uacc_multicoreend”

末鹿安然 提交于 2019-12-24 18:16:18
问题 I am trying to parallelize my program in C with OpenACC 2.5 on Ubuntu 16.04 LTS. After a simple modification which is just adding one line, I can compile all my .c files to .o files. In the linking step, pgcc compiler shows undefined reference to `__pgi_uacc_multicorestart' and undefined reference to `__pgi_uacc_multicoreend' . Google search shows nothing related to these error message. Please help me on this problem. Here is the information and source code related to my system and program. I

OpenACC parallel kernels not getting generated

亡梦爱人 提交于 2019-12-24 15:09:20
问题 I am developing a code on PGC++ for graphically accelerating the code. I am using OpenBabel which has Eigen dependancy. I have tried using #pragma acc kernel I have tried using #pragma acc routine My compilation command is: "pgc++ -acc -ta=tesla -Minfo=all -I/home/pranav/new_installed/include/openbabel-2.0/ -I/home/pranav/new_installed/include/eigen3/ -L/home/pranav/new_installed/lib/openbabel/ main.cpp /home/pranav/new_installed/lib/libopenbabel.so" I am getting following error PGCC-S-0155

OpenACC in Visual Studion (Visual C++)

狂风中的少年 提交于 2019-12-14 03:58:06
问题 I am very new in GPU. I want to write GPU in C++ by using OpenACC. I don't know how to add its libraries to the visual studio 2015. I've searched a lot in the Internet but couldnt find a good document to show the procedure. Could you please help me to fix it? Thank you in advance. 回答1: Visual Studio doesn't support OpenACC. You'll need to either use GNU or PGI on Linux, or PGI on Windows. Note that PGI doesn't support C++ on Windows. Hence, you'll want to write an OpenACC code in C and

RBM no improvement with OpenACC on the code yet

情到浓时终转凉″ 提交于 2019-12-12 04:22:36
问题 RBM algorithm is open source algorithm the source code is available here: https://github.com/yusugomori/DeepLearning/tree/master/cpp I tried to get improvement with OpenACC by different ways but the sequential code still better So can you tell me what should be done (part needs to improved) to get high improvement #include <iostream> #include <math.h> #include "utils.h" #include "RBM.h" using namespace std; using namespace utils; RBM::RBM(int size, int n_v, int n_h, double **w, double *hb,

-ta=tesla:managed:cuda8 but cuMemAllocManaged returned error 2: Out of memory

拈花ヽ惹草 提交于 2019-12-11 17:32:04
问题 I'm new to OpenACC. I like it very much so far as I'm familiar with OpenMP. I have 2 1080Ti cards each with 9GB and I've 128GB of RAM. I'm trying a very basic test to allocate an array, initialize it, then sum it up in parallel. This works for 8 GB but when I increase to 10 GB I get out-of-memory error. My understanding was that with unified memory of Pascal (which these card are) and CUDA 8, I could allocate an array larger than the GPU's memory and the hardware will page in and page out on