问题
I am using a kernel to "loop" over a live camera stream to highlight specific color regions. These can not always be reconstructed with some cv::threshold
s, therefor I am using a kernel.
The current kernel is as following:
__global__ void customkernel(unsigned char* input, unsigned char* output, int width, int height, int colorWidthStep, int outputWidthStep) {
const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;
if ((xIndex < width) && (yIndex < height)) {
const int color_tid = yIndex * colorWidthStep + (3*xIndex);
const int output_tid = yIndex * outputWidthStep + (3*xIndex);
const unsigned char red = input[color_tid+0];
const unsigned char green = input[color_tid+1];
const unsigned char blue = input[color_tid+2];
if (!(red > 100 && blue < 50 && red > 1.0*green)) {
output[output_tid] = 255;
output[output_tid+1] = 255;
output[output_tid+2] = 255;
} else {
output[output_tid] = 0;
output[output_tid+1] = 0;
output[output_tid+2] = 0;
}
}
}
This kernel gets called here:
extern "C" void myFunction(cv::cuda::GpuMat& input, cv::cuda::GpuMat& output) {
// Calculate total number of bytes of input and output image
const int colorBytes = input.step * input.rows;
const int outputBytes = output.step * output.rows;
unsigned char *d_input, *d_output;
// Allocate device memory
SAFE_CALL(cudaMalloc<unsigned char>(&d_input,colorBytes),"CUDA Malloc Failed");
SAFE_CALL(cudaMalloc<unsigned char>(&d_output,outputBytes),"CUDA Malloc Failed");
// Copy data from OpenCV input image to device memory
SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy Host To Device Failed");
// Specify a reasonable block size
const dim3 block(16,16);
// Calculate grid size to cover the whole image
const dim3 grid((input.cols + block.x - 1)/block.x, (input.rows + block.y - 1)/block.y);
// Launch the color conversion kernel
custom_kernel<<<grid,block>>>(d_input,d_output,input.cols,input.rows,input.step,output.step);
// Synchronize to check for any kernel launch errors
SAFE_CALL(cudaDeviceSynchronize(),"Kernel Launch Failed");
// Copy back data from destination device meory to OpenCV output image
SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host To Device Failed");
// Free the device memory
SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");
SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");
}
I included an example image that shows the result of the kernel on a red car. As you can see there are vertical red lines, even though I tried to access RGB/BGR values and set them either to zero or 255.
I used the following as a start, but I feel like cv::Mat
and cv::cuda::GpuMat
do not save their values in the same way. I read about GpuMat only having a ptr to its data, and thought that it would be used with the blockIdx
, blockDim
parameters.
https://github.com/sshniro/opencv-samples/blob/master/cuda-bgr-grey.cpp
Specific questions:
What is the reason for the red lines?
How can I change the RGB values correctly?
I am using Cuda 10.2 on Ubuntu 18.04 on a NVidia Xavier NX.
As mentioned in the comments I changed the parameters of the cudaMemcpy
function and deleted the cudaMalloc
and cudaFree
parts. Additionally I reminded myself, that OpenCV stores color in BGR, so I changed the (+0,+1,+2) inside the kernel.
And I loaded the red car directly via cv::imread, to exclude any previous formatting errors. Too great success, the kernel works.
回答1:
As mentioned by @sgarizvi in the comments the cv::cuda::GpuMat
already resides in the Gpu, so I had to use cudaMemcpyDeviceToDevice
instead of cudaMemcpyHostToDevice
.
It was also not necessary to allocate new memory, which was achieved deleting the cudaMalloc
and cudaFree
parts of the code above.
At last (just in this case, might be different for others) my Image input was the Zed 2 from StereoLabs, which publishes its images in RGBA so the order inside the memory is R -> G -> B -> A, converted to OpenCV it is B -> G -> R -> A which are 4 steps per pixel:
const int color_tid = yIndex * colorWidthStep + (4*xIndex);
const int output_tid = yIndex * outputWidthStep + (4*xIndex);
So to correctly adress each pixel you have to increase the pointer by four times the xIndex, use three times if you just have a BGR/RGB image or once if it is grayscale.
来源:https://stackoverflow.com/questions/65126907/change-cudagpumat-values-through-custom-kernel