pycuda | 易学教程

getrs function of cuSolver over pycuda doesn't work properly

阅读更多关于 getrs function of cuSolver over pycuda doesn't work properly

问题 I'm trying to make a pycuda wrapper inspired by scikits-cuda library for some operations provided in the new cuSolver library of Nvidia. I want to solve a linear system of the form AX=B by LU factorization, to perform that first use the cublasSgetrfBatched method from scikits-cuda, that give me the factorization LU; then with that factorization I want to solve the system using cusolverDnSgetrs from cuSolve that I want to wrap, when I perform the computation return status 3, the matrices that

How to generate random number inside pyCUDA kernel?

阅读更多关于 How to generate random number inside pyCUDA kernel?

I am using pyCUDA for CUDA programming. I need to use random number inside kernel function. CURAND library doesn't work inside it (pyCUDA). Since, there is lot of work to be done in GPU, generating random number inside CPU and then transferring them to GPU won't work, rather dissolve the motive of using GPU. Supplementary Questions: Is there a way to allocate memory on GPU using 1 block and 1 thread. I am using more than one kernel. Do I need to use multiple SourceModule blocks? talonmies Despite what you assert in your question, PyCUDA has pretty comprehensive support for CUrand. The GPUArray

How to generate random number inside pyCUDA kernel?

阅读更多关于 How to generate random number inside pyCUDA kernel?

问题 I am using pyCUDA for CUDA programming. I need to use random number inside kernel function. CURAND library doesn't work inside it (pyCUDA). Since, there is lot of work to be done in GPU, generating random number inside CPU and then transferring them to GPU won't work, rather dissolve the motive of using GPU. Supplementary Questions: Is there a way to allocate memory on GPU using 1 block and 1 thread. I am using more than one kernel. Do I need to use multiple SourceModule blocks? 回答1: Despite

How do I diagnose a CUDA launch failure due to being out of resources?

阅读更多关于 How do I diagnose a CUDA launch failure due to being out of resources?

I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well. I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device. Thanks for any

Installing pycuda-2013.1.1 on windows 7 64 bit

阅读更多关于 Installing pycuda-2013.1.1 on windows 7 64 bit

问题 FYI, I have 64 bit version of Python 2.7 and I followed the pycuda installation instruction to install pycuda. And I don't have any problem running following script. import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy a = numpy.random.randn(4,4) a = a.astype(numpy.float32) a_gpu = cuda.mem_alloc(a.nbytes) cuda.memcpy_htod(a_gpu,a) But after that, when executing this statement, mod = SourceModule(""" __global__ void doublify(float *a) { int

How do I diagnose a CUDA launch failure due to being out of resources?

阅读更多关于 How do I diagnose a CUDA launch failure due to being out of resources?

问题 I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well. I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I

PyCUDA+Threading = Invalid Handles on kernel invocations

阅读更多关于 PyCUDA+Threading = Invalid Handles on kernel invocations

I'll try and make this clear; I've got two classes; GPU(Object) , for general access to GPU functionality, and multifunc(threading.Thread) for a particular function I'm trying to multi-device-ify. GPU contains most of the 'first time' processing needed for all subsequent usecases, so multifunc gets called from GPU with its self instance passed as an __init__ argument (along with the usual queues and such). Unfortunately, multifunc craps out with: File "/home/bolster/workspace/project/gpu.py", line 438, in run prepare(d_A,d_B,d_XTG,offset,grid=N_grid,block=N_block) File "/usr/local/lib/python2

PyCUDA+Threading = Invalid Handles on kernel invocations

阅读更多关于 PyCUDA+Threading = Invalid Handles on kernel invocations

问题 I'll try and make this clear; I've got two classes; GPU(Object) , for general access to GPU functionality, and multifunc(threading.Thread) for a particular function I'm trying to multi-device-ify. GPU contains most of the 'first time' processing needed for all subsequent usecases, so multifunc gets called from GPU with its self instance passed as an __init__ argument (along with the usual queues and such). Unfortunately, multifunc craps out with: File "/home/bolster/workspace/project/gpu.py",

How to profile PyCuda code with the Visual Profiler?

阅读更多关于 How to profile PyCuda code with the Visual Profiler?

When I create a new session and tell the Visual Profiler to launch my python/pycuda scripts I get following error message: Execution run #1 of program '' failed, exit code: 255 These are my preferences: Launch: python "/pathtopycudafile/mysuperkernel.py" Working Directory: "/pathtopycudafile/mysuperkernel.py" Arguments: [empty] I use CUDA 4.0 under Ubuntu 10.10. 64Bit. Profiling compiled examples works. p.s. I am aware of SO question How to profile PyCuda code in Linux? , but seems to be an unrelated problem. Minimal example pycudaexample.py: import pycuda.autoinit import pycuda.driver as drv