Is there any way on CUDA 2.0 devices to disable L1 cache only for one specific variable?
I know that one can disable L1 cache at compile time adding the flag -Xptxas
Inline PTX can be used to load and store the variable. ld.cg and st.cg instructions only cache data in L2. The cache operators are described in section 8.7.8.1 Cache Operators of the PTX ISA 2.3 document. The instructions or interest are ld and st. Inline PTX is described in Using Inline PTX Assembly in CUDA.