When to use volatile with register/local variables
问题 What is the meaning of declaring register arrays in CUDA with volatile qualifier? When I tried with volatile keyword with a register array, it removed the number of spilled register memory to local memory. (i.e. Force the CUDA to use registers instead of local memory) Is this the intended behavior? I did not find any information about the usage of volatile with regard to register arrays in CUDA documentation. Here is the ptxas -v output for both versions With volatile qualifier __volatile__