cuda | 易学教程

Difference between @cuda.jit and @jit(target='gpu')

阅读更多关于 Difference between @cuda.jit and @jit(target='gpu')

问题 I have a question on working with Python CUDA libraries from Continuum's Accelerate and numba packages. Is using the decorator @jit with target = gpu the same as @cuda.jit ? 回答1: No, they are not the same, although the eventual compilation path into PTX into assembler is. The @jit decorator is the general compiler path, which can be optionally steered onto a CUDA device. The @cuda.jit decorator is effectively the low level Python CUDA kernel dialect which Continuum Analytics have developed.

CUDA - nvidia driver crash while running

阅读更多关于 CUDA - nvidia driver crash while running

问题 I run a raytracer in CUDA with N Bounces (each ray will bounce N times). I view the results using openGL. once N is small (1~4) everything works great. once i make N big (~10) each thread (about 800x1000) has to do a lot of computing and this when the screen goes black, and than back on, with the note that my nvidia crash. i searched online and think now that what cause it some sort of a watch-dog timer since i use the same graphic card for my display and my computing (computing takes more

CUDA - nvidia driver crash while running

阅读更多关于 CUDA - nvidia driver crash while running

Golang calling CUDA library

阅读更多关于 Golang calling CUDA library

问题 I am trying to call a CUDA function from my Go code. I have the following three files. test.h: int test_add(void); test.cu: __global__ void add(int *a, int *b, int *c){ *c = *a + *b; } int test_add(void) { int a, b, c; // host copies of a, b, c int *d_a, *d_b, *d_c; // device copies of a, b, c int size = sizeof(int); // Allocate space for device copies of a, b, c cudaMalloc((void **)&d_a, size); cudaMalloc((void **)&d_b, size); cudaMalloc((void **)&d_c, size); // Setup input values a = 2; b =

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

阅读更多关于 Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

问题 This question has already been asked before, but the asker didn't provide enough information and left unanswered and I am curious about the program. Original Question Link I'm trying to do a sobel edge detection using both opencv and cuda library, the sobel kernel for X direction is -1 0 1 -2 0 2 -1 0 1 I have 3 files in my project main.cpp CudaKernel.cu CudaKernel.h main.cpp #include <stdlib.h> #include <iostream> #include <string.h> #include <Windows.h> #include <opencv2\core\core.hpp>

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

阅读更多关于 Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

阅读更多关于 Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

Dynamic Allocating memory on GPU

阅读更多关于 Dynamic Allocating memory on GPU

问题 Is it possible to dynamically allocate memory on a GPU's Global memory inside the Kernel? i don't know how big will my answer be, therefore i need a way to allocate memory for each part of the answer. CUDA 4.0 alloww us to use the RAM... is it a good idea or will it reduce the speed?? 回答1: it is possible to use malloc inside a kernel. check the following which is taken from nvidia cuda guide: __global__ void mallocTest() { char* ptr = (char*)malloc(123); printf(“Thread %d got pointer: %p\n”,

Accessing class data members from within cuda kernel - how to design proper host/device interaction?

阅读更多关于 Accessing class data members from within cuda kernel - how to design proper host/device interaction?

问题 I've been trying to transform some cuda/C code into a more OO code, but my goal doesn't seem to be easy to achieve for my current understanding of the cuda functioning mechanism. I haven't been able to find good a explanation either on this situation. It might not be possible after all. I have a global object of class myClass holding an array to be filled in a kernel. How should the methods in myClass be defined so that the array and boolean members are visible from device and the array can

How to return a single variable from a CUDA kernel function?

阅读更多关于 How to return a single variable from a CUDA kernel function?

问题 I have a CUDA search function which calculate one single variable. How can I return it back. __global__ void G_SearchByNameID(node* Node, long nodeCount, long start,char* dest, long answer){ answer = 2; } cudaMemcpy(h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost); cudaFree(d_answer); for both of these lines I get this error: error: argument of type "long" is incompatible with parameter of type "const void *" 回答1: I've been using __device__ variables for this purpose, that way you