Change cuda::GpuMat values through custom kernel

问题

I am using a kernel to "loop" over a live camera stream to highlight specific color regions. These can not always be reconstructed with some cv::thresholds, therefor I am using a kernel.

The current kernel is as following:

__global__ void customkernel(unsigned char* input, unsigned char* output, int width, int height, int colorWidthStep, int outputWidthStep) {
    const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
    const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;

    if ((xIndex < width) && (yIndex < height)) {
        const int color_tid = yIndex * colorWidthStep + (3*xIndex);
        const int output_tid = yIndex * outputWidthStep + (3*xIndex);
        const unsigned char red   = input[color_tid+0];
        const unsigned char green = input[color_tid+1];
        const unsigned char blue  = input[color_tid+2];
        if (!(red > 100 && blue < 50 && red > 1.0*green)) {
            output[output_tid] = 255;
            output[output_tid+1] = 255; 
            output[output_tid+2] = 255;
        } else {
            output[output_tid] = 0;
            output[output_tid+1] = 0;
            output[output_tid+2] = 0;
        }
    }
}

This kernel gets called here:

extern "C" void myFunction(cv::cuda::GpuMat& input, cv::cuda::GpuMat& output) {
    // Calculate total number of bytes of input and output image
    const int colorBytes = input.step * input.rows;
    const int outputBytes = output.step * output.rows;

    unsigned char *d_input, *d_output;

    // Allocate device memory
    SAFE_CALL(cudaMalloc<unsigned char>(&d_input,colorBytes),"CUDA Malloc Failed");
    SAFE_CALL(cudaMalloc<unsigned char>(&d_output,outputBytes),"CUDA Malloc Failed");

    // Copy data from OpenCV input image to device memory
    SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy Host To Device Failed");

    // Specify a reasonable block size
    const dim3 block(16,16);

    // Calculate grid size to cover the whole image
    const dim3 grid((input.cols + block.x - 1)/block.x, (input.rows + block.y - 1)/block.y);

    // Launch the color conversion kernel
    custom_kernel<<<grid,block>>>(d_input,d_output,input.cols,input.rows,input.step,output.step);

    // Synchronize to check for any kernel launch errors
    SAFE_CALL(cudaDeviceSynchronize(),"Kernel Launch Failed");

    // Copy back data from destination device meory to OpenCV output image
    SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host To Device Failed");

    // Free the device memory
    SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");
    SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");
}

I included an example image that shows the result of the kernel on a red car. As you can see there are vertical red lines, even though I tried to access RGB/BGR values and set them either to zero or 255.

I used the following as a start, but I feel like cv::Mat and cv::cuda::GpuMat do not save their values in the same way. I read about GpuMat only having a ptr to its data, and thought that it would be used with the blockIdx, blockDim parameters. https://github.com/sshniro/opencv-samples/blob/master/cuda-bgr-grey.cpp

Specific questions:

What is the reason for the red lines?
How can I change the RGB values correctly?

I am using Cuda 10.2 on Ubuntu 18.04 on a NVidia Xavier NX.

As mentioned in the comments I changed the parameters of the cudaMemcpy function and deleted the cudaMalloc and cudaFree parts. Additionally I reminded myself, that OpenCV stores color in BGR, so I changed the (+0,+1,+2) inside the kernel. And I loaded the red car directly via cv::imread, to exclude any previous formatting errors. Too great success, the kernel works.

回答1:

As mentioned by @sgarizvi in the comments the cv::cuda::GpuMat already resides in the Gpu, so I had to use cudaMemcpyDeviceToDevice instead of cudaMemcpyHostToDevice.

It was also not necessary to allocate new memory, which was achieved deleting the cudaMalloc and cudaFree parts of the code above.

At last (just in this case, might be different for others) my Image input was the Zed 2 from StereoLabs, which publishes its images in RGBA so the order inside the memory is R -> G -> B -> A, converted to OpenCV it is B -> G -> R -> A which are 4 steps per pixel:

const int color_tid = yIndex * colorWidthStep + (4*xIndex);
const int output_tid = yIndex * outputWidthStep + (4*xIndex);

So to correctly adress each pixel you have to increase the pointer by four times the xIndex, use three times if you just have a BGR/RGB image or once if it is grayscale.

来源：https://stackoverflow.com/questions/65126907/change-cudagpumat-values-through-custom-kernel

标签

c++

OpenCV

cuda