jcuda

What is the easiest way to run working CUDA code in Java?

自作多情 提交于 2019-12-21 20:57:40
问题 I have some CUDA code I made in C and it seems to be working fine(its plain old C and not C++). I’m running a Hadoop cluster and wanted to consolidate my code so ideally I’m looking to run it within Java(long story short: system is too complex). Currently the C program parses a log file then takes a few thousand lines, processes each line in parallel on the GPU and then saves specific errors/transactions into a linked list then writes them to the drive. What is the best approach to do this?

How can I pass a struct to a kernel in JCuda

廉价感情. 提交于 2019-12-13 06:41:21
问题 I have already looked at this http://www.javacodegeeks.com/2011/10/gpgpu-with-jcuda-good-bad-and-ugly.html which says I must modify my kernel to take only single dimensional arrays. However I refuse to believe that it is impossible to create a struct and copy it to device memory in JCuda. I would imagine the usual implementation would be to create a case class (scala terminology) that extends some native api, which can then be turned into a struct that can be safely passed into the kernel.

JNI libraries deallocate memory upon garbage collection?

做~自己de王妃 提交于 2019-12-11 17:38:09
问题 I am using JCUDA and would like to know if the JNI objects are smart enough to deallocate when they are garbage collected? I can understand why this may not work in all situations, but I know it will work in my situation, so my followup question is: how can I accomplish this? Is there a "mode" I can set? Will I need to build a layer of abstraction? Or maybe the answer really is "no don't ever try that" so then why not? EDIT: I'm referring only to native objects created via JNI, not Java

What is the easiest way to run working CUDA code in Java?

℡╲_俬逩灬. 提交于 2019-12-04 16:02:13
I have some CUDA code I made in C and it seems to be working fine(its plain old C and not C++). I’m running a Hadoop cluster and wanted to consolidate my code so ideally I’m looking to run it within Java(long story short: system is too complex). Currently the C program parses a log file then takes a few thousand lines, processes each line in parallel on the GPU and then saves specific errors/transactions into a linked list then writes them to the drive. What is the best approach to do this? Is JCUDA a perfect mapping to C Cuda or is it totally different? Or does it make sense to Call C code

Loading multiple modules in JCuda is not working

不打扰是莪最后的温柔 提交于 2019-12-01 09:07:54
In jCuda one can load cuda files as PTX or CUBIN format and call(launch) __global__ functions (kernels) from Java. With keeping that in mind, I want to develop a framework with JCuda that gets user's __device__ function in a .cu file at run-time, loads and runs it. And I have already implemented a __global__ function, in which each thread finds out the start point of its related data, perform some computation, initialization and then call user's __device__ function. Here is my kernel pseudo code: extern "C" __device__ void userFunc(args); extern "C" __global__ void kernel(){ // initialize

Loading multiple modules in JCuda is not working

时光怂恿深爱的人放手 提交于 2019-12-01 07:17:10
问题 In jCuda one can load cuda files as PTX or CUBIN format and call(launch) __global__ functions (kernels) from Java. With keeping that in mind, I want to develop a framework with JCuda that gets user's __device__ function in a .cu file at run-time, loads and runs it. And I have already implemented a __global__ function, in which each thread finds out the start point of its related data, perform some computation, initialization and then call user's __device__ function. Here is my kernel pseudo

JIT in JCuda, loading multiple ptx modules

依然范特西╮ 提交于 2019-11-28 12:40:05
I said in this question that I had some problem loading ptx modules in JCuda and after @talonmies's idea, I implemented a JCuda version of his solution to load multiple ptx files and load them as a single module. Here is the related part of the code: import static jcuda.driver.JCudaDriver.cuLinkAddFile; import static jcuda.driver.JCudaDriver.cuLinkComplete; import static jcuda.driver.JCudaDriver.cuLinkCreate; import static jcuda.driver.JCudaDriver.cuLinkDestroy; import static jcuda.driver.JCudaDriver.cuModuleGetFunction; import static jcuda.driver.JCudaDriver.cuModuleLoadData; import jcuda