cuda | 易学教程

CUDA compiler is unable to compile a simple test program

阅读更多关于 CUDA compiler is unable to compile a simple test program

问题 I am trying to get NVIDIA's CUDA setup and installed on my PC which has an NVIDIA GEFORCE RTX 2080 SUPER graphics card. After hours of trying different things and lots of research I have gotten CUDA to work using the Command Prompt, though trying to use CUDA in CLion will not work. Using nvcc main.cu -o build.exe From the command line generates the executable and I can run it on the GPU, however I have the following error when trying to use CLion: I believe this is the relevant part, however

How can I get the number of CUDA cores in my GPU using Python and Numba?

阅读更多关于 How can I get the number of CUDA cores in my GPU using Python and Numba?

问题 I would like to know how to obtain the total number of CUDA Cores in my GPU using Python, Numba and cudatoolkit. 回答1: Most of what you need can be found by combining the information in this answer along with the information in this answer. We'll use the first answer to indicate how to get the device compute capability and also the number of streaming multiprocessors. We'll use the second answer (converted to python) to use the compute capability to get the "core" count per SM, then multiply

How can I get the number of CUDA cores in my GPU using Python and Numba?

阅读更多关于 How can I get the number of CUDA cores in my GPU using Python and Numba?

CUDA/C - Using malloc in kernel functions gives strange results

阅读更多关于 CUDA/C - Using malloc in kernel functions gives strange results

问题 I'm new to CUDA/C and new to stack overflow. This is my first question. I'm trying to allocate memory dynamically in a kernel function, but the results are unexpected. I read using malloc() in a kernel can lower performance a lot, but I need it anyway so I first tried with a simple int ** array just to test the possibility, then I'll actually need to allocate more complex structs. In my main I used cudaMalloc() to allocate the space for the array of int * , and then I used malloc() for every

CUDA/C - Using malloc in kernel functions gives strange results

阅读更多关于 CUDA/C - Using malloc in kernel functions gives strange results

CUDA/C - Using malloc in kernel functions gives strange results

阅读更多关于 CUDA/C - Using malloc in kernel functions gives strange results

MSU 出品的 H.264 编码器比较（2012.5）

阅读更多关于 MSU 出品的 H.264 编码器比较（2012.5）

莫斯科国立大学的MSU Graphics & Media Lab (Video Group)出品的H.264编码器性能测试报告《Eighth MPEG-4 AVC/H.264 Video Codecs Comparison》。这个报告每年都有，这是最近的一次。它们测试了主流的H.264编码器的性能。从测试的结果来看，开源产品x264性能已经超过了商用编码器的性能。在此简单记录其结果。一.概述参与测试的编码器有如下几种： H.264 DivX H.264 Elecard H.264 Intel Ivy Bridge QuickSync (GPU encoder) MainConcept H.264 (software) MainConcept H.264 (CUDA based encoder) MainConcept H.264 (OpenCL based encoder) DiscretePhoton x264 非H.264 XviD (MPEG-4 ASP codec) 使用的测试序列：序列帧数帧率分辨率视频会议 (5 个) Deadline 1374 30 352x288 Developers 4CIF 3600 30 640x480 Developers 720p 1500 30 1280x720 Presentation 548 30 720x480

Using maximum shared memory in Cuda

阅读更多关于 Using maximum shared memory in Cuda

问题 I am unable to use more than 48K of shared memory (on V100, Cuda 10.2) I call cudaFuncSetAttribute(my_kernel, cudaFuncAttributePreferredSharedMemoryCarveout, cudaSharedmemCarveoutMaxShared); before launching my_kernel first time. I use launch bounds and dynamic shared memory inside my_kernel : __global__ void __launch_bounds__(768, 1) my_kernel(...) { extern __shared__ float2 sh[]; ... } Kernel is called like this: dim3 blk(32, 24); // 768 threads as in launch_bounds. my_kernel<<<grd, blk, 64

Using maximum shared memory in Cuda

阅读更多关于 Using maximum shared memory in Cuda

Using maximum shared memory in Cuda

阅读更多关于 Using maximum shared memory in Cuda