Dealing with Boundary conditions / Halo regions in CUDA

后端未结

关注

 3  2146

甜味超标 2020-12-05 16:28

I\'m working on image processing with CUDA and i\'ve a doubt about pixel processing.

What is often done with the boundary pixels of an image when applying a

3条回答

慢半拍i (楼主)

2020-12-05 17:09

Your question is somewhat broad and I believe it mixes two problems:

dealing with boundary conditions;

dealing with halo regions.

The first problem (boundary conditions) is encountered, for example, when computing the convolution between and image and a 3 x 3 kernel. When the convolution window comes across the boundary, one has the problem of extending the image outside of its boundaries.

The second problem (halo regions) is encountered, for example, when loading a 16 x 16 tile within shared memory and one has to process the internal 14 x 14 tile to compute second order derivatives.

For the second issue, I think a useful question is the following: Analyzing memory access coalescing of my CUDA kernel.

Concerning the extension of a signal outside of its boundaries, a useful tool is provided in this case by texture memory thanks to the different provided addressing modes, see The different addressing modes of CUDA textures.

Below, I'm providing an example on how a median filter can be implemented with periodic boundary conditions using texture memory.

#include #include "TimingGPU.cuh" #include "Utilities.cuh" texture signal_texture; #define BLOCKSIZE 32 /*************************************************/ /* KERNEL FUNCTION FOR MEDIAN FILTER CALCULATION */ /*************************************************/ __global__ void median_filter_periodic_boundary(float * __restrict__ d_vec, const unsigned int N){ unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x; if (tid < N) { float signal_center = tex1D(signal_texture, tid - 0); float signal_before = tex1D(signal_texture, tid - 1); float signal_after = tex1D(signal_texture, tid + 1); printf("%i %f %f %f\n", tid, signal_before, signal_center, signal_after); d_vec[tid] = (signal_center + signal_before + signal_after) / 3.f; } } /********/ /* MAIN */ /********/ int main() { const int N = 10; // --- Input host array declaration and initialization float *h_arr = (float *)malloc(N * sizeof(float)); for (int i = 0; i < N; i++) h_arr[i] = (float)i; // --- Output host and device array vectors float *h_vec = (float *)malloc(N * sizeof(float)); float *d_vec; gpuErrchk(cudaMalloc(&d_vec, N * sizeof(float))); // --- CUDA array declaration and texture memory binding; CUDA array initialization cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(); //Alternatively //cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat); cudaArray *d_arr; gpuErrchk(cudaMallocArray(&d_arr, &channelDesc, N, 1)); gpuErrchk(cudaMemcpyToArray(d_arr, 0, 0, h_arr, N * sizeof(float), cudaMemcpyHostToDevice)); cudaBindTextureToArray(signal_texture, d_arr); signal_texture.normalized = false; signal_texture.addressMode[0] = cudaAddressModeWrap; // --- Kernel execution median_filter_periodic_boundary<<>>(d_vec, N); gpuErrchk(cudaPeekAtLastError()); gpuErrchk(cudaDeviceSynchronize()); gpuErrchk(cudaMemcpy(h_vec, d_vec, N * sizeof(float), cudaMemcpyDeviceToHost)); for (int i=0; i
0 讨论(0) 查看其它3个回答发布评论: 提交评论加载中...
验证码看不清? 提交回复