jcuda | 易学教程

What is the easiest way to run working CUDA code in Java?

阅读更多关于 What is the easiest way to run working CUDA code in Java?

问题 I have some CUDA code I made in C and it seems to be working fine(its plain old C and not C++). I’m running a Hadoop cluster and wanted to consolidate my code so ideally I’m looking to run it within Java(long story short: system is too complex). Currently the C program parses a log file then takes a few thousand lines, processes each line in parallel on the GPU and then saves specific errors/transactions into a linked list then writes them to the drive. What is the best approach to do this?

How can I pass a struct to a kernel in JCuda

阅读更多关于 How can I pass a struct to a kernel in JCuda

问题 I have already looked at this http://www.javacodegeeks.com/2011/10/gpgpu-with-jcuda-good-bad-and-ugly.html which says I must modify my kernel to take only single dimensional arrays. However I refuse to believe that it is impossible to create a struct and copy it to device memory in JCuda. I would imagine the usual implementation would be to create a case class (scala terminology) that extends some native api, which can then be turned into a struct that can be safely passed into the kernel.

JNI libraries deallocate memory upon garbage collection?

阅读更多关于 JNI libraries deallocate memory upon garbage collection?

问题 I am using JCUDA and would like to know if the JNI objects are smart enough to deallocate when they are garbage collected? I can understand why this may not work in all situations, but I know it will work in my situation, so my followup question is: how can I accomplish this? Is there a "mode" I can set? Will I need to build a layer of abstraction? Or maybe the answer really is "no don't ever try that" so then why not? EDIT: I'm referring only to native objects created via JNI, not Java

What is the easiest way to run working CUDA code in Java?

阅读更多关于 What is the easiest way to run working CUDA code in Java?

I have some CUDA code I made in C and it seems to be working fine(its plain old C and not C++). I’m running a Hadoop cluster and wanted to consolidate my code so ideally I’m looking to run it within Java(long story short: system is too complex). Currently the C program parses a log file then takes a few thousand lines, processes each line in parallel on the GPU and then saves specific errors/transactions into a linked list then writes them to the drive. What is the best approach to do this? Is JCUDA a perfect mapping to C Cuda or is it totally different? Or does it make sense to Call C code

Loading multiple modules in JCuda is not working

阅读更多关于 Loading multiple modules in JCuda is not working

In jCuda one can load cuda files as PTX or CUBIN format and call(launch) __global__ functions (kernels) from Java. With keeping that in mind, I want to develop a framework with JCuda that gets user's __device__ function in a .cu file at run-time, loads and runs it. And I have already implemented a __global__ function, in which each thread finds out the start point of its related data, perform some computation, initialization and then call user's __device__ function. Here is my kernel pseudo code: extern "C" __device__ void userFunc(args); extern "C" __global__ void kernel(){ // initialize

Loading multiple modules in JCuda is not working

阅读更多关于 Loading multiple modules in JCuda is not working

问题 In jCuda one can load cuda files as PTX or CUBIN format and call(launch) __global__ functions (kernels) from Java. With keeping that in mind, I want to develop a framework with JCuda that gets user's __device__ function in a .cu file at run-time, loads and runs it. And I have already implemented a __global__ function, in which each thread finds out the start point of its related data, perform some computation, initialization and then call user's __device__ function. Here is my kernel pseudo

JIT in JCuda, loading multiple ptx modules

阅读更多关于 JIT in JCuda, loading multiple ptx modules

I said in this question that I had some problem loading ptx modules in JCuda and after @talonmies's idea, I implemented a JCuda version of his solution to load multiple ptx files and load them as a single module. Here is the related part of the code: import static jcuda.driver.JCudaDriver.cuLinkAddFile; import static jcuda.driver.JCudaDriver.cuLinkComplete; import static jcuda.driver.JCudaDriver.cuLinkCreate; import static jcuda.driver.JCudaDriver.cuLinkDestroy; import static jcuda.driver.JCudaDriver.cuModuleGetFunction; import static jcuda.driver.JCudaDriver.cuModuleLoadData; import jcuda