PyCUDA+Threading = Invalid Handles on kernel invocations

感情迁移 提交于 2019-11-29 09:00:33

The reason is context affinity. Every CUDA function instance is tied to a context, and they are not portable (the same applies to memory allocations and texture references). So each context must load the function instance separately, and then use the function handle returned by that load operation.

If you are not using metaprogramming at all, you might find it simpler to compile your CUDA code to a cubin file, and then load the functions you need from the cubin to each context with driver.module_from_file. Cutting and pasting directly from some production code of mine:

# Context establishment
try:
    if (autoinit):
        import pycuda.autoinit
        self.context = None
        self.device = pycuda.autoinit.device
        self.computecc = self.device.compute_capability()
    else:
        driver.init()
        self.context = tools.make_default_context()
        self.device = self.context.get_device()
        self.computecc = self.device.compute_capability()

    # GPU code initialization
    # load pre compiled CUDA code from cubin file
    # Select the cubin based on the supplied dtype
    # cubin names contain C++ mangling because of
    # templating. Ugly but no easy way around it
    if self.computecc == (1,3):
        self.fimcubin = "fim_sm13.cubin"
    elif self.computecc[0] == 2:
        self.fimcubin = "fim_sm20.cubin"
    else:
        raise NotImplementedError("GPU architecture not supported")

    fimmod = driver.module_from_file(self.fimcubin)

    IterateName32 = "_Z10fimIterateIfLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji"
    IterateName64 = "_Z10fimIterateIdLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji"

    if (self.dtype == np.float32):
        IterateName = IterateName32
    elif (self.dtype == np.float64):
        IterateName = IterateName64
    else:
        raise TypeError

    self.fimIterate = fimmod.get_function(IterateName)

except ImportError:
    warn("Could not initialise CUDA context")

Typical; as soon as I write the question I work it out.

The issue was having the SourceModule operating outside of an active context. To fix it I moved the SourceModule invocation into the run function in the thread, below the cuda context setup.

Leaving this up for a while because I'm sure someone else has a better explanation!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!