openacc

How can I find the id of a gang in OpenACC?

好久不见. 提交于 2019-12-11 14:17:27
问题 In OpenMP I can use omp_get_thread_num() to get the numerical id of the thread executing the code. Is there a similar function I can use in OpenACC to get id of the gang executing a piece of code? 回答1: The OpenACC standard does not yet include such a function, but, with the PGI compiler, you can use the compiler extension function __pgi_gangidx() as follows: //pgc++ -fast -acc -ta=tesla,cc60 -Minfo=accel test.cpp #include <iostream> #include "openacc.h" int main(){ int gangs = 100; int *ids =

How is variable in device memory used by external function?

荒凉一梦 提交于 2019-12-11 05:38:45
问题 In this code: #include <iostream> void intfun(int * variable, int value){ #pragma acc parallel present(variable[:1]) num_gangs(1) num_workers(1) { *variable = value; } } int main(){ int var, value = 29; #pragma acc enter data create(var) copyin(value) intfun(&var,value); #pragma acc exit data copyout(var) delete(value) std::cout << var << std::endl; } How is int value recognized to be on device memory in intfun ? If I replace present(variable[:1]) by present(variable[:1],value) in the intfun

c - Linking a PGI OpenACC-enabled library with gcc

泪湿孤枕 提交于 2019-12-11 05:18:29
问题 Briefly speaking , my question relies in between compiling/building files (using libraries) with two different compilers while exploiting OpenACC constructs in source files. I have a C source file that has an OpenACC construct. It has only a simple function that computes total sum of an array: #include <stdio.h> #include <stdlib.h> #include <openacc.h> double calculate_sum(int n, double *a) { double sum = 0; int i; printf("Num devices: %d\n", acc_get_num_devices(acc_device_nvidia)); #pragma

OpenACC and object oriented C++

雨燕双飞 提交于 2019-12-11 04:27:17
问题 I am trying to write a object oriented C++ code that is parallelized with OpenACC. I was able to find some stackoverflow questions and GTC talks on OpenACC, but I could not find some real world examples of object oriented code. In this question an example for a OpenACCArray was shown that does some memory management in the background (code available at http://www.pgroup.com/lit/samples/gtc15_S5233.tar). However, I am wondering if it is possible create a class that manages the arrays on a

Does OpenACC take away from the normal GPU Rendering?

不羁岁月 提交于 2019-12-04 06:13:06
问题 I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not. The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in

Does OpenACC take away from the normal GPU Rendering?

安稳与你 提交于 2019-12-02 10:08:48
I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not. The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in theory) "high frame rates" or not. If so, what's the trade off, and how much? I'm not willing to loose

Using OpenACC to parallelize nested loops

℡╲_俬逩灬. 提交于 2019-11-30 10:33:56
I am very new to openacc and have just high-level knowledge so any help and explanation of what I am doing wrong would be appreciated. I am trying to accelerate(parallelize) a not so straightforward nested loop that updates a flattened (3D to 1D) array using openacc directives. I have posted a simplified sample code below that when compiled using pgcc -acc -Minfo=accel test.c gives the following error: call to cuStreamSynchronize returned error 700: Illegal address during kernel execution Code: #include <stdio.h> #include <stdlib.h> #define min(a,b) (a > b) ? b : a #define max(a,b) (a < b) ? b

OpenMP offloading to Nvidia wrong reduction

南笙酒味 提交于 2019-11-30 08:35:46
问题 I am interested in offloading work to the GPU with OpenMP. The code below gives the correct value of sum on the CPU //g++ -O3 -Wall foo.cpp -fopenmp #pragma omp parallel for reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; It also works on the GPU with OpenACC like this //g++ -O3 -Wall foo.cpp -fopenacc #pragma acc parallel loop reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; nvprof shows that it runs on the GPU and it's also faster than OpenMP on the CPU.

OpenMP offloading to Nvidia wrong reduction

白昼怎懂夜的黑 提交于 2019-11-29 10:50:25
I am interested in offloading work to the GPU with OpenMP. The code below gives the correct value of sum on the CPU //g++ -O3 -Wall foo.cpp -fopenmp #pragma omp parallel for reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; It also works on the GPU with OpenACC like this //g++ -O3 -Wall foo.cpp -fopenacc #pragma acc parallel loop reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; nvprof shows that it runs on the GPU and it's also faster than OpenMP on the CPU. However when I try to offload to the GPU with OpenMP like this //g++ -O3 -Wall foo.cpp -fopenmp -fno