cuda | 易学教程

LLVM retrieve name of AllocaInst

阅读更多关于 LLVM retrieve name of AllocaInst

问题 I am trying to retrieve the name of the pointer passed to a cudaMalloc call. CallInst *CUMallocCI = ... ; // CI of cudaMalloc call Value *Ptr = CUMallocCI->getOperand(0); if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) { errs() << AI->getName() << "\n"; } The above however just prints an empty line. Is is possible to get the pointer name out of this alloca? This is the relevant IR: %28 = alloca i8*, align 8 ... ... call void @llvm.dbg.declare(metadata i8** %28, metadata !926,

Tensorflow cannot open libcuda.so.1

阅读更多关于 Tensorflow cannot open libcuda.so.1

问题 I have a laptop with a GeForce 940 MX. I want to get Tensorflow up and running on the gpu. I installed everything from their tutorial page, now when I import Tensorflow, I get >>> import tensorflow as tf I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft

Tensorflow cannot open libcuda.so.1

阅读更多关于 Tensorflow cannot open libcuda.so.1

CUDA matrix transpose with shared memory

阅读更多关于 CUDA matrix transpose with shared memory

问题 I need to implement a matrix transpose function on a GPU using shared memory. I have done it in a simple way without shared memory which works fine and also an attempt with SM. But unfortunately the calculation is not correct and I can not figure out why. A complete working example can be found here and at the bottom of this question. EDIT 1 I further know that the first index of the result where I have a wrong value is index 32 (of the flatted matrix, so matr[0][32] in a two dimensional

CUDA matrix transpose with shared memory

阅读更多关于 CUDA matrix transpose with shared memory

GPU memory not getting free using cudaMalloc3DArray

阅读更多关于 GPU memory not getting free using cudaMalloc3DArray

问题 I am using C++, GTX1070 I am allocating a cuda array as described: //variables: Vdepth = 200, Vheight = 100, Vwidth = 100, device = 0 VolumeId = 0 cudaExtent volumeSize = make_cudaExtent(Vdepth, Vheight, Vwidth); cudaArray *d_volumeArray = NULL; cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<texture_type>(); VERIFY_CALL( cudaMalloc3DArray(&d_volumeArray, &channelDesc, volumeSize) ); cu_VolArray[device][VolumeId] = d_volumeArray; Then I try to free it like this: VERIFY_CALL

How can I get use cuda inside a gitlab-ci docker executor

阅读更多关于 How can I get use cuda inside a gitlab-ci docker executor

问题 We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU. Additional details: Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose We are not using ubuntu or centOS, so

How can I get use cuda inside a gitlab-ci docker executor

阅读更多关于 How can I get use cuda inside a gitlab-ci docker executor

cuda atomicAdd example fails to yield correct output

阅读更多关于 cuda atomicAdd example fails to yield correct output

问题 The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here? __global__ void testAdd(float *a) { float temp; for (int i = 0; i < 100 ; i++) { a[i] = atomicAdd(&a[i], 1.0f); } } void cuTestAtomicAdd(float *a) { testAdd<<<1, 10>>>(a); } My goal is to understand the workings of atomic operations, so

cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

阅读更多关于 cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

问题 I get the following error when l run tensorflow in GPU. 2018-09-15 18:56:51.011724: E tensorflow/core/common_runtime/direct_session.cc:158] Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version Traceback (most recent call last): File "evaluate_sample.py", line 160, in <module> tf.app.run(main) File "/anaconda3/envs/tf/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "evaluate_sample.py"