nvcc | 易学教程

Why do gcc and NVCC (g++) see two different structure sizes?

阅读更多关于 Why do gcc and NVCC (g++) see two different structure sizes?

I am trying to add CUDA to an existing single threaded C program that was written sometime in the late 90s. To do this I need to mix two languages, C and C++ (nvcc is a c++ compiler). The problem is that the C++ compiler sees a structure as a certain size, while the C compile sees the same structure as a slightly different size. Thats bad. I am really puzzled by this because I can't find a cause for a 4 byte discrepancy. /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o My C++ looks like

Is there any documentation for NVCC's `#pragma nv_exec_check_disable` and/or `#pragma hd_warning_disable`?

阅读更多关于 Is there any documentation for NVCC's `#pragma nv_exec_check_disable` and/or `#pragma hd_warning_disable`?

问题 Some projects use #pragma nv_exec_check_disable and/or #pragma hd_warning_disable to silence NVCC warnings about warning: calling a __host__ function from a __host__ __device__ function is not allowed` However they seem completely undocumented, e.g. in the CUDA 9.1 reference. Is there any relevant documentation anywhere ? 来源： https://stackoverflow.com/questions/49328130/is-there-any-documentation-for-nvccs-pragma-nv-exec-check-disable-and-or-p

Completely disable optimizations on NVCC

阅读更多关于 Completely disable optimizations on NVCC

I'm trying to measure peak single-precision flops on my GPU, for that I'm modifying a PTX file to perform successive MAD instructions on registers. Unfortunately the compiler is removing all the code because it actually does nothing usefull since I do not perform any load/store of the data. Is there a compiler flag or pragma to add to the code so the compiler does not touch it? Thanks. I don't think there is any way to turn off such optimization in the compiler. You can work around this by adding code to store your values and wrapping that code in a conditional statement that is always false.

CUDA compiler (nvcc) macro

阅读更多关于 CUDA compiler (nvcc) macro

Is there a #define compiler (nvcc) macro of CUDA which I can use? (Like _WIN32 for Windows and so on.) I need this for header code that will be common between nvcc and VC++ compilers. I know I can go ahead and define my own and pass it as an argument to the nvcc compiler (-D), but it would be great if there is one already defined. __CUDACC__ I don't think it will be that trivial. Check the following thread http://forums.nvidia.com/index.php?showtopic=32369&st=0&p=179913&#entry179913 N. Pattakos I know it has been long time now, but you might also find __CUDA_ARCH__ useful. 来源： https:/

Bad GPU performance when compiling with -G parameter with nvcc compiler

阅读更多关于 Bad GPU performance when compiling with -G parameter with nvcc compiler

问题 I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? 回答1: Using the -G switch disables most compiler optimizations that

Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

阅读更多关于 Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

问题 I am getting the following error when running make for my CUDA (v7.5) application: nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd' I'm not sure why. It seems something is likely wrong with my Makefile. Here it is -- any ideas what might be causing the error? Thank you in advance! CC = nvcc CFLAGS = -std=c++11 -m64 -arch=compute_35 -code=sm_35 --compiler-options=-Wall,-Wno-unused-function,-Wno-unused-local-typedef,-Wno-unused-private-field LFLAGS = -L/usr/local/lib -Llib $

CUDA SASS to Cubin

阅读更多关于 CUDA SASS to Cubin

问题 With CuObjDump SASS can be generated from Cubin file using cuobjdump -sass <input file> , But is there any way to convert the SASS back to Cubin. 回答1: There are no "assemblers" provided as part of the official NVIDIA CUDA toolchain. The NVIDIA toolchain can take CUDA C/C++, or PTX, and convert it to a cubin or other executable format. However there are some community-developed assemblers: Perhaps the most recent one at this time (probably the only one worth considering at this time) is maxas.

Bad GPU performance when compiling with -G parameter with nvcc compiler

阅读更多关于 Bad GPU performance when compiling with -G parameter with nvcc compiler

I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? Using the -G switch disables most compiler optimizations that nvcc might do in device code. The resulting code will often run slower than code that is not compiled with -G

Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

阅读更多关于 Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

I am getting the following error when running make for my CUDA (v7.5) application: nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd' I'm not sure why. It seems something is likely wrong with my Makefile. Here it is -- any ideas what might be causing the error? Thank you in advance! CC = nvcc CFLAGS = -std=c++11 -m64 -arch=compute_35 -code=sm_35 --compiler-options=-Wall,-Wno-unused-function,-Wno-unused-local-typedef,-Wno-unused-private-field LFLAGS = -L/usr/local/lib -Llib $(shell pkg-config --libs libmongoc-1.0 libbson-1.0) INCLUDES = -I/usr/include -I/usr/local/include

CUDA SASS to Cubin

阅读更多关于 CUDA SASS to Cubin

With CuObjDump SASS can be generated from Cubin file using cuobjdump -sass <input file> , But is there any way to convert the SASS back to Cubin. There are no "assemblers" provided as part of the official NVIDIA CUDA toolchain. The NVIDIA toolchain can take CUDA C/C++, or PTX, and convert it to a cubin or other executable format. However there are some community-developed assemblers: Perhaps the most recent one at this time (probably the only one worth considering at this time) is maxas . There also was an older one asfermi developed in the Fermi generation of CUDA GPUs. I don't think it has