nvcc

Why do gcc and NVCC (g++) see two different structure sizes?

痞子三分冷 提交于 2019-12-04 05:58:21
I am trying to add CUDA to an existing single threaded C program that was written sometime in the late 90s. To do this I need to mix two languages, C and C++ (nvcc is a c++ compiler). The problem is that the C++ compiler sees a structure as a certain size, while the C compile sees the same structure as a slightly different size. Thats bad. I am really puzzled by this because I can't find a cause for a 4 byte discrepancy. /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o My C++ looks like

Is there any documentation for NVCC's `#pragma nv_exec_check_disable` and/or `#pragma hd_warning_disable`?

谁说我不能喝 提交于 2019-12-04 03:43:52
问题 Some projects use #pragma nv_exec_check_disable and/or #pragma hd_warning_disable to silence NVCC warnings about warning: calling a __host__ function from a __host__ __device__ function is not allowed` However they seem completely undocumented, e.g. in the CUDA 9.1 reference. Is there any relevant documentation anywhere ? 来源: https://stackoverflow.com/questions/49328130/is-there-any-documentation-for-nvccs-pragma-nv-exec-check-disable-and-or-p

Completely disable optimizations on NVCC

坚强是说给别人听的谎言 提交于 2019-12-03 21:16:48
I'm trying to measure peak single-precision flops on my GPU, for that I'm modifying a PTX file to perform successive MAD instructions on registers. Unfortunately the compiler is removing all the code because it actually does nothing usefull since I do not perform any load/store of the data. Is there a compiler flag or pragma to add to the code so the compiler does not touch it? Thanks. I don't think there is any way to turn off such optimization in the compiler. You can work around this by adding code to store your values and wrapping that code in a conditional statement that is always false.

CUDA compiler (nvcc) macro

做~自己de王妃 提交于 2019-12-03 05:51:22
Is there a #define compiler (nvcc) macro of CUDA which I can use? (Like _WIN32 for Windows and so on.) I need this for header code that will be common between nvcc and VC++ compilers. I know I can go ahead and define my own and pass it as an argument to the nvcc compiler (-D), but it would be great if there is one already defined. __CUDACC__ I don't think it will be that trivial. Check the following thread http://forums.nvidia.com/index.php?showtopic=32369&st=0&p=179913&#entry179913 N. Pattakos I know it has been long time now, but you might also find __CUDA_ARCH__ useful. 来源: https:/

Bad GPU performance when compiling with -G parameter with nvcc compiler

你。 提交于 2019-12-02 16:23:06
问题 I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? 回答1: Using the -G switch disables most compiler optimizations that

Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

守給你的承諾、 提交于 2019-12-02 14:17:08
问题 I am getting the following error when running make for my CUDA (v7.5) application: nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd' I'm not sure why. It seems something is likely wrong with my Makefile. Here it is -- any ideas what might be causing the error? Thank you in advance! CC = nvcc CFLAGS = -std=c++11 -m64 -arch=compute_35 -code=sm_35 --compiler-options=-Wall,-Wno-unused-function,-Wno-unused-local-typedef,-Wno-unused-private-field LFLAGS = -L/usr/local/lib -Llib $

CUDA SASS to Cubin

ⅰ亾dé卋堺 提交于 2019-12-02 11:37:30
问题 With CuObjDump SASS can be generated from Cubin file using cuobjdump -sass <input file> , But is there any way to convert the SASS back to Cubin. 回答1: There are no "assemblers" provided as part of the official NVIDIA CUDA toolchain. The NVIDIA toolchain can take CUDA C/C++, or PTX, and convert it to a cubin or other executable format. However there are some community-developed assemblers: Perhaps the most recent one at this time (probably the only one worth considering at this time) is maxas.

Bad GPU performance when compiling with -G parameter with nvcc compiler

白昼怎懂夜的黑 提交于 2019-12-02 10:26:15
I am doing some tests and I realized that using the -G parameter when compiling is giving me a bad performance than without it. I have checked the documentation in Nvidia: --device-debug (-G) Generate debug information for device code. But it is not helping me to know the reason why is giving me such bad performance. Where is it generating this debug information and when? and what could be the cause of this bad performance? Using the -G switch disables most compiler optimizations that nvcc might do in device code. The resulting code will often run slower than code that is not compiled with -G

Getting error: “nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd'”

会有一股神秘感。 提交于 2019-12-02 09:01:24
I am getting the following error when running make for my CUDA (v7.5) application: nvlink error : Undefined reference to '_ZN8Strategy8backtestEPddd' I'm not sure why. It seems something is likely wrong with my Makefile. Here it is -- any ideas what might be causing the error? Thank you in advance! CC = nvcc CFLAGS = -std=c++11 -m64 -arch=compute_35 -code=sm_35 --compiler-options=-Wall,-Wno-unused-function,-Wno-unused-local-typedef,-Wno-unused-private-field LFLAGS = -L/usr/local/lib -Llib $(shell pkg-config --libs libmongoc-1.0 libbson-1.0) INCLUDES = -I/usr/include -I/usr/local/include

CUDA SASS to Cubin

邮差的信 提交于 2019-12-02 04:42:53
With CuObjDump SASS can be generated from Cubin file using cuobjdump -sass <input file> , But is there any way to convert the SASS back to Cubin. There are no "assemblers" provided as part of the official NVIDIA CUDA toolchain. The NVIDIA toolchain can take CUDA C/C++, or PTX, and convert it to a cubin or other executable format. However there are some community-developed assemblers: Perhaps the most recent one at this time (probably the only one worth considering at this time) is maxas . There also was an older one asfermi developed in the Fermi generation of CUDA GPUs. I don't think it has