openACC passing a list of struct

杀马特。学长 韩版系。学妹 提交于 2020-01-05 04:27:09

问题


I have a C program to find whether 2 sets of polygons are overlapped. User input 2 sets of polygon (each set of data has several thousands polygons) and the program see which polygon in set1 overlap with which polygon in set2

I have 2 struct like these:

struct gpc_vertex  /* Polygon vertex */
{
    double          x;
    double          y;
};

struct gpc_vertex_list  /* Polygon contour */
{
    int pid;    // polygon id
    int             num_vertices;
    double *mbr;   // minimum bounding rectangle of the polygon, so always 4 elements

};

I have the following segment of code:

#pragma acc kernels copy(listOfPolygons1[0:polygonCount1], listOfPolygons2[0:polygonCount2], listOfBoolean[0:dump])
for (i=0; i<polygonCount1; i++){
    polygon1 = listOfPolygons1[i];

    for (j=0; j<polygonCount2; j++){

        polygon2 = listOfPolygons2[j];
        idx = polygonCount2 * i + j;

        listOfBoolean[idx] = isRectOverlap(polygon1.mbr, polygon2.mbr);  // line 115

    }
}

listOfPolygons1 and listOfPolygons2 are (as the name implied) an array of gpc_vertex_list. listOfBoolean is an array of int.
the mbr of the 2 polygons are checked to see if they are overlapped, and the function "isRectOverlap" return 1 if they are, 0 if they are not and put the value to listOfBoolean

Problem
The code can compile but not able to run. It returns the following error:

call to cuEventSynchronize returned error 700: Illegal address during kernel execution

My observation
The program can compile and run by changing line 115 to this:

isRectOverlap(polygon1.mbr, polygon2.mbr); // without assigning value to listOfBoolean

or this:

listOfBoolean[idx] = 5; // assigning an arbitrary value

(though the result is wrong, but at least, it can run)

Question
Both "isRectOverlap" and "listOfBoolean" do not seem to produce the problem if value is not passed from "isRectOverlap" to "listOfBoolean"
Does anyone know why it can't run if I assign the return value from "isRectOverlap" to "listOfBoolean"?

isRectOverlap function is like this:

int isRectOverlap(double *shape1, double *shape2){

    if (shape1[0] > shape2[2] || shape2[0] > shape1[2]){
        return 0;
    }

    if (shape1[1] < shape2[3] || shape2[1] < shape1[3]){
        return 0;
    }

    return 1;

}

The program has no problem when not running in OpenACC

Thanks for helping


回答1:


When aggregate data types are used in an OpenACC data clause, a shallow copy of the type is performed. What's most likely happening here is that when the listOfPolygons arrays are copied to the device, "mbr" will contain host addresses. Hence, the program will give an illegal address error when a "mbr" is accessed.

Given the comment says that "mbr" will always be 4, the simplest thing to do is make "mbr" a fixed size array of size 4.

Assuming you're using PGI compilers with an NVIDIA device, a second method is to use CUDA Unified Memory by compiling "-ta=tesla:managed". All dynamic memory would be handled by the CUDA runtime and allow host addresses to be accessed on the device. The caveats being that it's only available for dynamic data, your whole program can only use as much memory as available on the device, and it may slow down your program. http://www.pgroup.com/lit/articles/insider/v6n2a4.htm

A third option is to perform a deep copy of the aggregate type to the device. I can post an example if you decide to go this route. I also talk about the subject as part of a presentation I did at GTC2015: https://www.youtube.com/watch?v=rWLmZt_u5u4




回答2:


Here's a simplified example. The key is to use unstructured data regions at the same spots where you allocate the host data. First allocate the array of structs and create or copyin the array to the device. Here I just create the array so the device data is garbage, but if I did a copyin, then a shallow copy would occur and the host addresses for "mbr" would be copied to the device. To fix this, you need to create each "mbr" on the device. The compiler will then assign, "attach", the device "mbr" pointer thus overwriting the garbage/host pointer value. Once "mbr" has valid device pointers, they can be deferenced on the device.

% cat example_struct.c
#include <stdlib.h>
#include <stdio.h>
#ifndef N
#define N 1024
#endif

typedef struct gpc_vertex_list
{
    int pid;    // polygon id
    int num_vertices;
    double *mbr;   // minimum bounding rectangle of the polygon, so always 4 elements

} gpc_vertex_list;

gpc_vertex_list * allocData(size_t size);
int deleteData(gpc_vertex_list * A, size_t size);
int initData(gpc_vertex_list *Ai, size_t size);

#pragma acc routine seq
int isRectOverlap(double * mbr) {
    int result;
    result = mbr[0];
    result += mbr[1];
    result += mbr[2];
    result += mbr[3];
    return result;
}

int main() {
    gpc_vertex_list *A;
    gpc_vertex_list B;
    size_t size, i;
    int * listOfBoolean;
    size = N;
    A=allocData(size);
    initData(A,size);
    listOfBoolean = (int*) malloc(sizeof(int)*size);

#pragma acc parallel loop present(A) copyout(listOfBoolean[0:size])  private(B)
    for (i=0; i<size; i++){
       B = A[i];
       listOfBoolean[i] = isRectOverlap(B.mbr);
    }

    printf("result: %d %d %d\n",listOfBoolean[0], listOfBoolean[size/2], listOfBoolean[size-1]);
    free(listOfBoolean);
    deleteData(A, size);
    exit(0);
}

gpc_vertex_list * allocData(size_t size) {
    gpc_vertex_list * tmp;
    tmp = (gpc_vertex_list *) malloc(size*sizeof(gpc_vertex_list));
/* Create the array on device.  */
#pragma acc enter data create(tmp[0:size])
    for (int i=0; i< size; ++i) {
       tmp[i].mbr = (double*) malloc(sizeof(double)*4);
/* create the member array on the device */
#pragma acc enter data create(tmp[i].mbr[0:4])
    }
    return tmp;
}

int deleteData(gpc_vertex_list * A, size_t size) {
/* Delete the host copy. */
    for (int i=0; i< size; ++i) {
#pragma acc exit data delete(A[i].mbr)
        free(A[i].mbr);
    }
#pragma acc exit data delete(A)
    free(A);
}

int initData(gpc_vertex_list *A ,size_t size) {
    size_t i;
    for (int i=0; i< size; ++i) {
       A[i].pid = i;
       A[i].num_vertices = 4;
       for (int j=0; j<4;++j) {
           A[i].mbr[j]=(i*4)+j;
       }
       #pragma acc update device(A[i].pid,A[i].num_vertices,A[i].mbr[0:4])
    }
}
% pgcc example_struct.c -acc -Minfo=accel
isRectOverlap:
     20, Generating acc routine seq
main:
     39, Generating copyout(listOfBoolean[:size])
         Generating present(A[:])
         Accelerator kernel generated
         Generating Tesla code
         40, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     39, Local memory used for B
allocData:
     55, Generating enter data create(tmp[:size])
     59, Generating enter data create(tmp->mbr[:4])
deleteData:
     67, Generating exit data delete(A->mbr[:1])
     70, Generating exit data delete(A[:1])
initData:
     83, Generating update device(A->mbr[:4],A->pid,A->num_vertices)
% a.out
result: 6 8198 16374


来源:https://stackoverflow.com/questions/38779782/openacc-passing-a-list-of-struct

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!