Dealing with Boundary conditions / Halo regions in CUDA

后端 未结 3 2146
甜味超标
甜味超标 2020-12-05 16:28

I\'m working on image processing with CUDA and i\'ve a doubt about pixel processing.

What is often done with the boundary pixels of an image when applying a

3条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-05 17:09

    Your question is somewhat broad and I believe it mixes two problems:

    1. dealing with boundary conditions;
    2. dealing with halo regions.

    The first problem (boundary conditions) is encountered, for example, when computing the convolution between and image and a 3 x 3 kernel. When the convolution window comes across the boundary, one has the problem of extending the image outside of its boundaries.

    The second problem (halo regions) is encountered, for example, when loading a 16 x 16 tile within shared memory and one has to process the internal 14 x 14 tile to compute second order derivatives.

    For the second issue, I think a useful question is the following: Analyzing memory access coalescing of my CUDA kernel.

    Concerning the extension of a signal outside of its boundaries, a useful tool is provided in this case by texture memory thanks to the different provided addressing modes, see The different addressing modes of CUDA textures.

    Below, I'm providing an example on how a median filter can be implemented with periodic boundary conditions using texture memory.

    #include 
    
    #include "TimingGPU.cuh"
    #include "Utilities.cuh"
    
    texture signal_texture;
    
    #define BLOCKSIZE 32
    
    /*************************************************/
    /* KERNEL FUNCTION FOR MEDIAN FILTER CALCULATION */
    /*************************************************/
    __global__ void median_filter_periodic_boundary(float * __restrict__ d_vec, const unsigned int N){
    
        unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;
    
        if (tid < N) {
    
            float signal_center = tex1D(signal_texture, tid - 0);
            float signal_before = tex1D(signal_texture, tid - 1);
            float signal_after  = tex1D(signal_texture, tid + 1);
    
            printf("%i %f %f %f\n", tid, signal_before, signal_center, signal_after);
    
            d_vec[tid] = (signal_center + signal_before + signal_after) / 3.f;
    
        }
    }
    
    
    /********/
    /* MAIN */
    /********/
    int main() {
    
        const int N = 10;
    
        // --- Input host array declaration and initialization
        float *h_arr = (float *)malloc(N * sizeof(float));
        for (int i = 0; i < N; i++) h_arr[i] = (float)i;
    
        // --- Output host and device array vectors
        float *h_vec = (float *)malloc(N * sizeof(float));
        float *d_vec;   gpuErrchk(cudaMalloc(&d_vec, N * sizeof(float)));
    
        // --- CUDA array declaration and texture memory binding; CUDA array initialization
        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();
        //Alternatively
        //cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
    
        cudaArray *d_arr;   gpuErrchk(cudaMallocArray(&d_arr, &channelDesc, N, 1));
        gpuErrchk(cudaMemcpyToArray(d_arr, 0, 0, h_arr, N * sizeof(float), cudaMemcpyHostToDevice));
    
        cudaBindTextureToArray(signal_texture, d_arr); 
        signal_texture.normalized = false; 
        signal_texture.addressMode[0] = cudaAddressModeWrap;
    
        // --- Kernel execution
        median_filter_periodic_boundary<<>>(d_vec, N);
        gpuErrchk(cudaPeekAtLastError());
        gpuErrchk(cudaDeviceSynchronize());
    
        gpuErrchk(cudaMemcpy(h_vec, d_vec, N * sizeof(float), cudaMemcpyDeviceToHost));
    
        for (int i=0; i

提交回复
热议问题