What's the 'right' way to implement a 32-bit memset for CUDA?

♀尐吖头ヾ 提交于 2019-12-20 04:04:29

问题


CUDA has the API call

cudaError_t cudaMemset (void *devPtr, int value, size_t count)

which fills a buffer with a single-byte value. I want to fill it with a multi-byte value. Suppose, for the sake of simplicity, that I want to fill devPtr with a 32-bit (4-byte) value, and suppose we can ignore endianness. Now, the CUDA driver has the following API call:

CUresult cuMemsetD32(CUdeviceptr dstDevice, unsigned int ui, size_t N)

So is it enough for me to just: obtain the CUdeviceptr from the device-memory-space pointer, then make the driver API call? Or is there something else I need to be doing?


回答1:


As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. So yes, you can use cuMemsetD32 to fill a runtime API allocation with a 32 bit value. The size of CUdeviceptr will match the size of void *on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr or vice versa.




回答2:


Based on talonmies' answer, it seems a reasonable (though ugly) approach would be:

#include <stdint.h>
inline cudaError_t cudaMemsetTyped<T>(void *devPtr, T value, size_t count);

#define INSTANTIATE_CUDA_MEMSET_TYPED(_nbits) \
inline cudaError_t cudaMemsetTyped<int ## _nbits ## _t>(void *devPtr, int ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), value, count); \
} \
inline cudaError_t cudaMemsetTyped<uint ## _nbits ## _t>(void *devPtr, uint ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<uint ## _nbits ## _t>(value), count); \
} \

INSTANTIATE_CUDA_MEMSET_TYPED(8)
INSTANTIATE_CUDA_MEMSET_TYPED(16)
INSTANTIATE_CUD_AMEMSET_TYPED(32)

#undef INSTANTIATE_CUDA_MEMSET_TYPED(_nbits)

inline cudaError_t cudaMemsetTyped<float>(void *devPtr, float value, size_t count) {
    cuMemsetD32( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<int>(value), count);
}

(no cuMemset64 it seems, so no double either)



来源:https://stackoverflow.com/questions/22483955/whats-the-right-way-to-implement-a-32-bit-memset-for-cuda

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!