pycuda

Why is my rather trivial CUDA program erring with certain arguments?

僤鯓⒐⒋嵵緔 提交于 2019-12-13 00:38:09
问题 I made a simple CUDA program for practice. It simply copies over data from one array to another: import pycuda.driver as cuda import pycuda.autoinit import numpy as np from pycuda.compiler import SourceModule # Global constants N = 2**20 # size of array a a = np.linspace(0, 1, N) e = np.empty_like(a) block_size_x = 512 # Instantiate block and grid sizes. block_size = (block_size_x, 1, 1) grid_size = (N / block_size_x, 1) # Create the CUDA kernel, and run it. mod = SourceModule(""" __global__

How should I interpret this CUDA error?

假如想象 提交于 2019-12-12 22:06:08
问题 I am teaching myself CUDA with pyCUDA. In this exercise, I want to send over a simply array of 1024 floats to the GPU and store it in shared memory. As I specify below in my arguments, I run this kernel on just a single block with 1024 threads. import pycuda.driver as cuda from pycuda.compiler import SourceModule import pycuda.autoinit import numpy as np import matplotlib.pyplot as plt arrayOfFloats = np.float64(np.random.sample(1024)) mod = SourceModule(""" __global__ void myVeryFirstKernel

PyCUDA context error when using Flask

有些话、适合烂在心里 提交于 2019-12-12 13:16:45
问题 I am using the PyCUDA to implement the smooth_local_affine as shown here. It works well when I simply run the program on linux. But when I tried to import it under Flask context: from smooth_local_affine import smooth_local_affine from flask import Flask app = Flask(_name_) ... The following error occurs: ------------------------------------------------------------------- PyCUDA ERROR: The context stack was not empty upon module cleanup. -------------------------------------------------------

Print messages in PyCUDA

不想你离开。 提交于 2019-12-12 11:09:56
问题 In simple CUDA programs we can print messages by threads by including cuPrintf.h but doing this in PyCUDA is not explained anywhere. How to do this in PyCUDA? 回答1: On Compute Capability 2.0 and later GPUs, cuPrintf.h is discouraged in favor of just using CUDA's built-in printf(). To use it, just #include <stdio.h> and call printf() just like on the host. The PyCUDA wiki has a specific example of this. 来源: https://stackoverflow.com/questions/11905841/print-messages-in-pycuda

Create arrays in shared memory w/o templates like in PyOpenCL

和自甴很熟 提交于 2019-12-11 19:29:57
问题 How can I create an array in shared memory without modifying the kernel using templates as seen in the official examples. Or is using templates the official way? In PyOpenCL I can create an array in local memory with setting a kernel argument kernel.set_arg(1,numpy.uint32(a_width)) ... KERNEL_CODE = """ __kernel void matrixMul(__local float* A_temp,...) { ...} """ 回答1: CUDA supports dynamic shared memory allocation at kernel run time, but the mechanism is a bit different to OpenCL. In the

Inplace transpose of 3D array in PyCuda

空扰寡人 提交于 2019-12-11 12:09:54
问题 I have a 3D array and would like to transpose its first two dimensions (x & y), but not the 3rd (z). On a 3D array A I want the same result as numpy's A.transpose((1,0,2)) . Specifically, I want to get the "transposed" global threadIdx . The code below is supposed to write the transposed index at the untransposed location in 3D array A. It doesn't. Any advice? import numpy as np from pycuda import compiler, gpuarray import pycuda.driver as cuda import pycuda.autoinit kernel_code = """ _

ExecError: error invoking 'nvcc --version': [Errno 2] No such file or directory: 'nvcc': 'nvcc'

被刻印的时光 ゝ 提交于 2019-12-11 07:33:24
问题 I am trying this code on my spyder(python3). import pycuda.autoinit import pycuda.driver as drv import numpy from pycuda.compiler import SourceModule mod = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } """) multiply_them = mod.get_function("multiply_them") a = numpy.random.randn(400).astype(numpy.float32) b = numpy.random.randn(400).astype(numpy.float32) dest = numpy.zeros_like(a) multiply_them( drv.Out

PyCUDA either fails to find function in NVIDIA source code or throws 'may not have extern “C” Linkage' error

天大地大妈咪最大 提交于 2019-12-11 06:55:45
问题 I am trying to use (and learn from) Mark Harris's optimized reduction kernel, by copying his source code into a simple pycuda application (full source of my attempt is listed below). Unfortunately, I run into one of the two following erros. The cuda kernel does not compile, throwing the following error message. kernel.cu(3): error: this declaration may not have extern "C" linkage If I include the argument no_extern_c=True into the line that compiles the kernel, the following error is raised:

Calling __host__ functions in PyCUDA

寵の児 提交于 2019-12-11 05:26:54
问题 Is it possible to call __host__ functions in pyCUDA like you can __global__ functions? I noticed in the documentation that pycuda.driver.Function creates a handle to a __global__ function. __device__ functions can be called from a __global__ function, but __host__ code cannot. I'm aware that using a __host__ function pretty much defeats the purpose of pyCUDA , but there are some already made functions that I'd like to import and call as a proof of concept. As a note, whenever I try to import

pyCUDA can't print result

♀尐吖头ヾ 提交于 2019-12-10 18:16:36
问题 Recently, I use pip to install the pyCUDA for my python3.4.3. But I found when I test the sample code(https://documen.tician.de/pycuda/tutorial.html#getting-started), it can't print the result without any error message,the program can end. I can't understand what's wrong with this code or my python,thank you all for answer.This is my code: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy import random a =[random.randint(0,20) for i in