nvcc

Can I make my compiler use fast-math on a per-function basis?

点点圈 提交于 2020-05-13 04:19:58
问题 Suppose I have template <bool UsesFastMath> void foo(float* data, size_t length); and I want to compile one instantiation with -ffast-math ( --use-fast-math for nvcc), and the other instantiation without it. This can be achieved by instantiating each of the variants in a separate translation unit, and compiling each of them with a different command-line - with and without the switch. My question is whether it's possible to indicate to popular compilers (*) to apply or not apply -ffast-math

Can I make my compiler use fast-math on a per-function basis?

与世无争的帅哥 提交于 2020-05-13 04:13:09
问题 Suppose I have template <bool UsesFastMath> void foo(float* data, size_t length); and I want to compile one instantiation with -ffast-math ( --use-fast-math for nvcc), and the other instantiation without it. This can be achieved by instantiating each of the variants in a separate translation unit, and compiling each of them with a different command-line - with and without the switch. My question is whether it's possible to indicate to popular compilers (*) to apply or not apply -ffast-math

Can I make my compiler use fast-math on a per-function basis?

大憨熊 提交于 2020-05-13 04:11:49
问题 Suppose I have template <bool UsesFastMath> void foo(float* data, size_t length); and I want to compile one instantiation with -ffast-math ( --use-fast-math for nvcc), and the other instantiation without it. This can be achieved by instantiating each of the variants in a separate translation unit, and compiling each of them with a different command-line - with and without the switch. My question is whether it's possible to indicate to popular compilers (*) to apply or not apply -ffast-math

Can I override a CUDA host-and-device function with a host-only function?

蓝咒 提交于 2020-02-05 02:36:12
问题 Consider the following program: class A { __host__ __device__ void foo(); }; class B : A { __host__ void foo(); }; int main() { A a; (void) a; B b; (void) b; } This compiles (GodBolt) with nvcc 10. Yet, in more complex programs, I sometimes get the following error (line breaks for readability): whatever.hpp(88): error: execution space mismatch: overridden entity (function "C::foo") is a __host__ __device__ function, but overriding entity (function "D::foo") is a __host__ function So, nvcc is

Suppress “stack size cannot be dynamically determined” warnings?

半世苍凉 提交于 2020-01-25 10:21:07
问题 I'm getting a CUDA warning saying ptxas warning : Stack size for entry function '_Z13a_test_kernelv' cannot be statically determined. Now, I know what it means, and there's a SO question about why it happens. What I want to suppress the warning (when compiling with nvcc 10.x). Can I? If so, where exactly do I put the warning suppression #pragma for this? 回答1: Add -Xptxas -suppress-stack-size-warning when compiling with nvcc. 来源: https://stackoverflow.com/questions/59328507/suppress-stack-size

nvcc is picking wrong libcudart library

杀马特。学长 韩版系。学妹 提交于 2020-01-24 21:37:14
问题 This problem comes when, I try to import theano with gpu mode. While importing the theano, it tries to compile some code, make a shared library of it and tries to load it. Here is the command to make the so file. nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,\ -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker \ -rpath,/home/jay/.theano/compiledir_Linux-4.8--ARCH-x86_64-with-arch-Arch-Linux--3.6.0-64/cuda_ndarray \ -I/usr/lib

nvcc is picking wrong libcudart library

偶尔善良 提交于 2020-01-24 21:37:07
问题 This problem comes when, I try to import theano with gpu mode. While importing the theano, it tries to compile some code, make a shared library of it and tries to load it. Here is the command to make the so file. nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,\ -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker \ -rpath,/home/jay/.theano/compiledir_Linux-4.8--ARCH-x86_64-with-arch-Arch-Linux--3.6.0-64/cuda_ndarray \ -I/usr/lib

Make nvcc output traces on compile error

一个人想着一个人 提交于 2020-01-14 03:15:14
问题 I have some trouble compiling some code with nvcc. It heavily relies on templates and the like so error messages are hard to read. For example currently I'm getting a message /usr/include/boost/utility/detail/result_of_iterate.hpp:135:338: error: invalid use of qualified-name ‘std::allocator_traits<_Alloc>::propagate_on_container_swap’ which is not really helpful. No information on where it came from or what the template arguments were. Compiling with e.g. gcc shows some really nice output

Invoking nvcc.exe using CreateProcess

微笑、不失礼 提交于 2020-01-06 07:58:47
问题 We currently use a mock JIT compiler for CUDA, where nvcc.exe is invoked on some files and the resulting .ptx files are generated. bool executeWindowsProcess(ofstream &logFF) { STARTUPINFO si; PROCESS_INFORMATION pi; ZeroMemory( &si, sizeof(si) ); si.cb = sizeof(si); ZeroMemory( &pi, sizeof(pi) ); char cmd[] = "\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v5.0\\bin\\nvcc.exe\""; char args[] = "\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v5.0\\bin\\nvcc.exe\" --ptx -