cuda

Difference between @cuda.jit and @jit(target='gpu')

百般思念 提交于 2021-02-07 09:13:00
问题 I have a question on working with Python CUDA libraries from Continuum's Accelerate and numba packages. Is using the decorator @jit with target = gpu the same as @cuda.jit ? 回答1: No, they are not the same, although the eventual compilation path into PTX into assembler is. The @jit decorator is the general compiler path, which can be optionally steered onto a CUDA device. The @cuda.jit decorator is effectively the low level Python CUDA kernel dialect which Continuum Analytics have developed.

CUDA - nvidia driver crash while running

北城以北 提交于 2021-02-07 08:39:27
问题 I run a raytracer in CUDA with N Bounces (each ray will bounce N times). I view the results using openGL. once N is small (1~4) everything works great. once i make N big (~10) each thread (about 800x1000) has to do a lot of computing and this when the screen goes black, and than back on, with the note that my nvidia crash. i searched online and think now that what cause it some sort of a watch-dog timer since i use the same graphic card for my display and my computing (computing takes more

CUDA - nvidia driver crash while running

瘦欲@ 提交于 2021-02-07 08:38:13
问题 I run a raytracer in CUDA with N Bounces (each ray will bounce N times). I view the results using openGL. once N is small (1~4) everything works great. once i make N big (~10) each thread (about 800x1000) has to do a lot of computing and this when the screen goes black, and than back on, with the note that my nvidia crash. i searched online and think now that what cause it some sort of a watch-dog timer since i use the same graphic card for my display and my computing (computing takes more

Golang calling CUDA library

柔情痞子 提交于 2021-02-07 05:42:23
问题 I am trying to call a CUDA function from my Go code. I have the following three files. test.h: int test_add(void); test.cu: __global__ void add(int *a, int *b, int *c){ *c = *a + *b; } int test_add(void) { int a, b, c; // host copies of a, b, c int *d_a, *d_b, *d_c; // device copies of a, b, c int size = sizeof(int); // Allocate space for device copies of a, b, c cudaMalloc((void **)&d_a, size); cudaMalloc((void **)&d_b, size); cudaMalloc((void **)&d_c, size); // Setup input values a = 2; b =

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

二次信任 提交于 2021-02-07 04:14:33
问题 This question has already been asked before, but the asker didn't provide enough information and left unanswered and I am curious about the program. Original Question Link I'm trying to do a sobel edge detection using both opencv and cuda library, the sobel kernel for X direction is -1 0 1 -2 0 2 -1 0 1 I have 3 files in my project main.cpp CudaKernel.cu CudaKernel.h main.cpp #include <stdlib.h> #include <iostream> #include <string.h> #include <Windows.h> #include <opencv2\core\core.hpp>

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

馋奶兔 提交于 2021-02-07 04:11:25
问题 This question has already been asked before, but the asker didn't provide enough information and left unanswered and I am curious about the program. Original Question Link I'm trying to do a sobel edge detection using both opencv and cuda library, the sobel kernel for X direction is -1 0 1 -2 0 2 -1 0 1 I have 3 files in my project main.cpp CudaKernel.cu CudaKernel.h main.cpp #include <stdlib.h> #include <iostream> #include <string.h> #include <Windows.h> #include <opencv2\core\core.hpp>

Applying Sobel Edge Detection with CUDA and OpenCV on a grayscale jpg image

烂漫一生 提交于 2021-02-07 04:09:49
问题 This question has already been asked before, but the asker didn't provide enough information and left unanswered and I am curious about the program. Original Question Link I'm trying to do a sobel edge detection using both opencv and cuda library, the sobel kernel for X direction is -1 0 1 -2 0 2 -1 0 1 I have 3 files in my project main.cpp CudaKernel.cu CudaKernel.h main.cpp #include <stdlib.h> #include <iostream> #include <string.h> #include <Windows.h> #include <opencv2\core\core.hpp>

Dynamic Allocating memory on GPU

泪湿孤枕 提交于 2021-02-07 03:28:41
问题 Is it possible to dynamically allocate memory on a GPU's Global memory inside the Kernel? i don't know how big will my answer be, therefore i need a way to allocate memory for each part of the answer. CUDA 4.0 alloww us to use the RAM... is it a good idea or will it reduce the speed?? 回答1: it is possible to use malloc inside a kernel. check the following which is taken from nvidia cuda guide: __global__ void mallocTest() { char* ptr = (char*)malloc(123); printf(“Thread %d got pointer: %p\n”,

Accessing class data members from within cuda kernel - how to design proper host/device interaction?

僤鯓⒐⒋嵵緔 提交于 2021-02-06 12:50:54
问题 I've been trying to transform some cuda/C code into a more OO code, but my goal doesn't seem to be easy to achieve for my current understanding of the cuda functioning mechanism. I haven't been able to find good a explanation either on this situation. It might not be possible after all. I have a global object of class myClass holding an array to be filled in a kernel. How should the methods in myClass be defined so that the array and boolean members are visible from device and the array can

How to return a single variable from a CUDA kernel function?

时光怂恿深爱的人放手 提交于 2021-02-06 00:46:10
问题 I have a CUDA search function which calculate one single variable. How can I return it back. __global__ void G_SearchByNameID(node* Node, long nodeCount, long start,char* dest, long answer){ answer = 2; } cudaMemcpy(h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost); cudaFree(d_answer); for both of these lines I get this error: error: argument of type "long" is incompatible with parameter of type "const void *" 回答1: I've been using __device__ variables for this purpose, that way you