问题
I'm trying to measure peak single-precision flops on my GPU, for that I'm modifying a PTX file to perform successive MAD instructions on registers. Unfortunately the compiler is removing all the code because it actually does nothing usefull since I do not perform any load/store of the data. Is there a compiler flag or pragma to add to the code so the compiler does not touch it?
Thanks.
回答1:
I don't think there is any way to turn off such optimization in the compiler. You can work around this by adding code to store your values and wrapping that code in a conditional statement that is always false. To make a conditional that the compiler can't determine to always be false, use at least one variable (not just constants).
回答2:
To completely disable optimizations with nvcc, you can use the following:
nvcc -O0 -Xopencc -O0 -Xptxas -O0 // sm_1x targets using Open64 frontend
nvcc -O0 -Xcicc -O0 -Xptxas -O0 // sm_2x and sm_3x targets using NVVM frontend
Note that the resulting code may be extremely slow. The -O0 flag is passed to the host compiler to disable host code optimization. The -Xopencc -O0 and -Xcicc -O0 flags control the compiler frontend (the part that produces PTX) and turn off optimizations there. The -Xptxas -O0 flag controls the compiler backend (the part that converts PTX to machine code) and turns off optimizations in that part. Note that -Xopencc, -Xcicc, and -Xptxas flags are component-level flags, and unless documented in the nvcc manual, should be considered unsupported.
回答3:
(I am still in CUDA 4.0, it may have changed with the new version)
To disable optimizations of ptxas
(the tool that converts ptx into cubin) you need to pass an option --opt-level 0
(default is --opt-level 3
). If you want to pass this option through nvcc
you will need to prefix it with --ptxas-options
.
Do note however, that ptxas
does a lot of useful optimizations that --- when disabled --- may render your code even slower if not incorrect at all! For example, it does register allocation and tries to predict where is shared and where is global memory.
回答4:
These worked for me:
-g -G -Xcompiler -O0 -Xptxas -O0 -lineinfo -O0
回答5:
As far as I know, there is no compiler flag or pragma for that. but you can compute more and store less
来源:https://stackoverflow.com/questions/11821605/completely-disable-optimizations-on-nvcc