pyCuda, issues sending multiple single variable arguments

给你一囗甜甜゛ 提交于 2021-02-16 21:14:06

问题


I have a pycuda program here that reads in an image from the command line and saves a version back with the colors inverted:

import pycuda.autoinit
import pycuda.driver as device
from pycuda.compiler import SourceModule as cpp

import numpy as np
import sys
import cv2

modify_image = cpp("""
__global__ void modify_image(int pixelcount, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}
""").get_function("modify_image")

print("Loading image")

image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED).astype(np.uint8)

print("Processing image")

pixels = image.shape[0] * image.shape[1]
newchannels = []
for channel in cv2.split(image):
  output = np.zeros_like(channel)
  modify_image(
    device.In(np.int32(pixels)),
    device.In(channel),
    device.Out(output),
    block=(1024,1,1), grid=(pixels // 1024 + 1, 1))
  newchannels.append(output)
finalimage = cv2.merge(newchannels)

print("Saving image")

cv2.imwrite("processed.png", finalimage)

print("Done")

It works perfectly fine, even on larger images. However, in trying to expand the functionality of the program, I came across a really strange issue wherein adding a second variable argument to the kernel causes the program to completely fail, simply saving a completely black image. The following code does not work;

import pycuda.autoinit
import pycuda.driver as device
from pycuda.compiler import SourceModule as cpp

import numpy as np
import sys
import cv2

modify_image = cpp("""
__global__ void modify_image(int pixelcount, int width, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}
""").get_function("modify_image")

print("Loading image")

image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED).astype(np.uint8)

print("Processing image")

pixels = image.shape[0] * image.shape[1]
newchannels = []
for channel in cv2.split(image):
  output = np.zeros_like(channel)
  modify_image(
    device.In(np.int32(pixels)),
    device.In(np.int32(image.shape[0])),
    device.In(channel),
    device.Out(output),
    block=(1024,1,1), grid=(pixels // 1024 + 1, 1))
  newchannels.append(output)
finalimage = cv2.merge(newchannels)

print("Saving image")

cv2.imwrite("processed.png", finalimage)

print("Done")

where the only difference is on two lines, the kernel header and it's call. The actual code of the kernel itself is unchanged, and yet this small addition completely breaks the program. Neither the compiler nor interpreter throw any errors. I have no idea how to begin to debug it, and am thoroughly confused.


回答1:


The device.In and relatives are designed for use with objects which support the Python buffer protocols (like numpy arrays). The source of your problem is using them to transfer non-buffer objects.

Just pass your scalars with the correct numpy dtype directly to your kernel call. Don't use device.In. The fact this worked in the original case was a complete accident




回答2:


Okay, so by changing the variable arguments to pointers in the kernel it fixed the code, i'm not sure how or why. Here is the modified version of the kernel;

__global__ void modify_image(int* pixelcount, int* width, unsigned char* inputimage, unsigned char* outputimage)
{
  int id = threadIdx.x + blockIdx.x * blockDim.x;
  if (id >= *pixelcount)
    return;

  outputimage[id] = 255 - inputimage[id];
}

The remainder of the code is unchanged. If anybody wants to explain why this is a successful fix, I would greatly appreciate it.



来源:https://stackoverflow.com/questions/44125164/pycuda-issues-sending-multiple-single-variable-arguments

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!