openacc | 易学教程

How can I find the id of a gang in OpenACC?

阅读更多关于 How can I find the id of a gang in OpenACC?

问题 In OpenMP I can use omp_get_thread_num() to get the numerical id of the thread executing the code. Is there a similar function I can use in OpenACC to get id of the gang executing a piece of code? 回答1: The OpenACC standard does not yet include such a function, but, with the PGI compiler, you can use the compiler extension function __pgi_gangidx() as follows: //pgc++ -fast -acc -ta=tesla,cc60 -Minfo=accel test.cpp #include <iostream> #include "openacc.h" int main(){ int gangs = 100; int *ids =

How is variable in device memory used by external function?

阅读更多关于 How is variable in device memory used by external function?

问题 In this code: #include <iostream> void intfun(int * variable, int value){ #pragma acc parallel present(variable[:1]) num_gangs(1) num_workers(1) { *variable = value; } } int main(){ int var, value = 29; #pragma acc enter data create(var) copyin(value) intfun(&var,value); #pragma acc exit data copyout(var) delete(value) std::cout << var << std::endl; } How is int value recognized to be on device memory in intfun ? If I replace present(variable[:1]) by present(variable[:1],value) in the intfun

c - Linking a PGI OpenACC-enabled library with gcc

阅读更多关于 c - Linking a PGI OpenACC-enabled library with gcc

问题 Briefly speaking , my question relies in between compiling/building files (using libraries) with two different compilers while exploiting OpenACC constructs in source files. I have a C source file that has an OpenACC construct. It has only a simple function that computes total sum of an array: #include <stdio.h> #include <stdlib.h> #include <openacc.h> double calculate_sum(int n, double *a) { double sum = 0; int i; printf("Num devices: %d\n", acc_get_num_devices(acc_device_nvidia)); #pragma

OpenACC and object oriented C++

阅读更多关于 OpenACC and object oriented C++

问题 I am trying to write a object oriented C++ code that is parallelized with OpenACC. I was able to find some stackoverflow questions and GTC talks on OpenACC, but I could not find some real world examples of object oriented code. In this question an example for a OpenACCArray was shown that does some memory management in the background (code available at http://www.pgroup.com/lit/samples/gtc15_S5233.tar). However, I am wondering if it is possible create a class that manages the arrays on a

Does OpenACC take away from the normal GPU Rendering?

阅读更多关于 Does OpenACC take away from the normal GPU Rendering?

问题 I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not. The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in

Does OpenACC take away from the normal GPU Rendering?

阅读更多关于 Does OpenACC take away from the normal GPU Rendering?

I'm trying to figure out if I can use OpenACC in place of normal CPU serial execution calls. Usually my programming is all about 3D programming, or uses the GPU normally in some way. I.E. Image processing, or some other type of rendering that requires the use of shaders. I'm trying to figure out if this Library would benefit me or not. The reason I ask this is because if I'm rendering 3D Graphics (as fast as possible) would it slow down that process in away? Or is it able to maintain it's (in theory) "high frame rates" or not. If so, what's the trade off, and how much? I'm not willing to loose

Using OpenACC to parallelize nested loops

阅读更多关于 Using OpenACC to parallelize nested loops

I am very new to openacc and have just high-level knowledge so any help and explanation of what I am doing wrong would be appreciated. I am trying to accelerate(parallelize) a not so straightforward nested loop that updates a flattened (3D to 1D) array using openacc directives. I have posted a simplified sample code below that when compiled using pgcc -acc -Minfo=accel test.c gives the following error: call to cuStreamSynchronize returned error 700: Illegal address during kernel execution Code: #include <stdio.h> #include <stdlib.h> #define min(a,b) (a > b) ? b : a #define max(a,b) (a < b) ? b

OpenMP offloading to Nvidia wrong reduction

阅读更多关于 OpenMP offloading to Nvidia wrong reduction

问题 I am interested in offloading work to the GPU with OpenMP. The code below gives the correct value of sum on the CPU //g++ -O3 -Wall foo.cpp -fopenmp #pragma omp parallel for reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; It also works on the GPU with OpenACC like this //g++ -O3 -Wall foo.cpp -fopenacc #pragma acc parallel loop reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; nvprof shows that it runs on the GPU and it's also faster than OpenMP on the CPU.

OpenMP offloading to Nvidia wrong reduction

阅读更多关于 OpenMP offloading to Nvidia wrong reduction

I am interested in offloading work to the GPU with OpenMP. The code below gives the correct value of sum on the CPU //g++ -O3 -Wall foo.cpp -fopenmp #pragma omp parallel for reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; It also works on the GPU with OpenACC like this //g++ -O3 -Wall foo.cpp -fopenacc #pragma acc parallel loop reduction(+:sum) for(int i = 0 ; i < 2000000000; i++) sum += i%11; nvprof shows that it runs on the GPU and it's also faster than OpenMP on the CPU. However when I try to offload to the GPU with OpenMP like this //g++ -O3 -Wall foo.cpp -fopenmp -fno