cuda

LLVM retrieve name of AllocaInst

a 夏天 提交于 2020-07-18 07:35:08
问题 I am trying to retrieve the name of the pointer passed to a cudaMalloc call. CallInst *CUMallocCI = ... ; // CI of cudaMalloc call Value *Ptr = CUMallocCI->getOperand(0); if (AllocaInst *AI = dyn_cast<AllocaInst>(Ptr) != nullptr) { errs() << AI->getName() << "\n"; } The above however just prints an empty line. Is is possible to get the pointer name out of this alloca? This is the relevant IR: %28 = alloca i8*, align 8 ... ... call void @llvm.dbg.declare(metadata i8** %28, metadata !926,

Tensorflow cannot open libcuda.so.1

微笑、不失礼 提交于 2020-07-17 09:47:59
问题 I have a laptop with a GeForce 940 MX. I want to get Tensorflow up and running on the gpu. I installed everything from their tutorial page, now when I import Tensorflow, I get >>> import tensorflow as tf I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft

Tensorflow cannot open libcuda.so.1

笑着哭i 提交于 2020-07-17 09:47:19
问题 I have a laptop with a GeForce 940 MX. I want to get Tensorflow up and running on the gpu. I installed everything from their tutorial page, now when I import Tensorflow, I get >>> import tensorflow as tf I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft

CUDA matrix transpose with shared memory

那年仲夏 提交于 2020-07-16 08:04:49
问题 I need to implement a matrix transpose function on a GPU using shared memory. I have done it in a simple way without shared memory which works fine and also an attempt with SM. But unfortunately the calculation is not correct and I can not figure out why. A complete working example can be found here and at the bottom of this question. EDIT 1 I further know that the first index of the result where I have a wrong value is index 32 (of the flatted matrix, so matr[0][32] in a two dimensional

CUDA matrix transpose with shared memory

不羁岁月 提交于 2020-07-16 08:04:30
问题 I need to implement a matrix transpose function on a GPU using shared memory. I have done it in a simple way without shared memory which works fine and also an attempt with SM. But unfortunately the calculation is not correct and I can not figure out why. A complete working example can be found here and at the bottom of this question. EDIT 1 I further know that the first index of the result where I have a wrong value is index 32 (of the flatted matrix, so matr[0][32] in a two dimensional

GPU memory not getting free using cudaMalloc3DArray

南笙酒味 提交于 2020-07-09 05:21:36
问题 I am using C++, GTX1070 I am allocating a cuda array as described: //variables: Vdepth = 200, Vheight = 100, Vwidth = 100, device = 0 VolumeId = 0 cudaExtent volumeSize = make_cudaExtent(Vdepth, Vheight, Vwidth); cudaArray *d_volumeArray = NULL; cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<texture_type>(); VERIFY_CALL( cudaMalloc3DArray(&d_volumeArray, &channelDesc, volumeSize) ); cu_VolArray[device][VolumeId] = d_volumeArray; Then I try to free it like this: VERIFY_CALL

How can I get use cuda inside a gitlab-ci docker executor

好久不见. 提交于 2020-07-08 10:58:35
问题 We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU. Additional details: Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose We are not using ubuntu or centOS, so

How can I get use cuda inside a gitlab-ci docker executor

青春壹個敷衍的年華 提交于 2020-07-08 10:58:33
问题 We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU. Additional details: Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose We are not using ubuntu or centOS, so

cuda atomicAdd example fails to yield correct output

好久不见. 提交于 2020-07-06 02:55:21
问题 The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here? __global__ void testAdd(float *a) { float temp; for (int i = 0; i < 100 ; i++) { a[i] = atomicAdd(&a[i], 1.0f); } } void cuTestAtomicAdd(float *a) { testAdd<<<1, 10>>>(a); } My goal is to understand the workings of atomic operations, so

cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

自闭症网瘾萝莉.ら 提交于 2020-07-04 13:21:09
问题 I get the following error when l run tensorflow in GPU. 2018-09-15 18:56:51.011724: E tensorflow/core/common_runtime/direct_session.cc:158] Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version Traceback (most recent call last): File "evaluate_sample.py", line 160, in <module> tf.app.run(main) File "/anaconda3/envs/tf/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "evaluate_sample.py"