问题
I would like to declare the alignment for a global device variable in CUDA. Specifically, I have a string declaration, like
__device__ char str1 = "some pre-defined string";
In normal gcc, I can request alignment from the compiler as
__device__ char str1 __attribute__ ((aligned (4))) = "some pre-defined string";
However, when I tried this on nvcc, the compiler ignores these requests. The reason I would like to do this is to copy these strings onto a buffer in my kernels, and copying words at a time is much faster than copying bytes at a time, though they require that the src string be aligned. Can anyone please tell me how to request alignment from the nvcc compiler?
回答1:
See section 5.3.2 "Size and Alignment Requirement" of the "CUDA C Programming Guide", which can be found here:
The alignment requirement is automatically fulfilled for the built-in types of char, short, int, long, longlong, float, double like float2 or float4.
For structs, the size and alignment requirements can be enforced by the compiler using the alignment specifiers __align__(8)
or __align__(16)
.
Example usage:
struct __align__(8) {
float r;
float i;
} complex_num;
回答2:
Can you check if this works?
__device__ char __align__(4) str1 = "some pre-defined string";
来源:https://stackoverflow.com/questions/13205742/how-to-specify-alignment-for-global-device-variables-in-cuda