gpu-programming

Hash table implementation for GPU [closed]

自作多情 提交于 2019-12-03 22:06:42
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am looking for a hash table implementation that I can use for CUDA coding. are there any good one's out there. Something like the Python dictionary . I will use strings as my keys 回答1: Alcantara et al have demonstrated a data-parallel algorithm for building hash tables on the GPU. I believe the implementation

How to determine maximum batch size for a seq2seq tensorflow RNN training model

北城以北 提交于 2019-12-03 16:41:40
Currently, I am using the default 64 as the batch size for the seq2seq tensorflow model. What is the maximum batch size , layer size etc I can go with a single Titan X GPU with 12 GB RAM with Haswell-E xeon 128GB RAM. The input data is converted to embeddings. Following are some helpful parameters I am using , it seems the cell input size is 1024: encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. tf.app.flags.DEFINE_integer("size", 1024, "Size of each model layer.") So based on my hardware what is the

What do work items execute when conditionals are used in GPU programming?

六眼飞鱼酱① 提交于 2019-12-03 14:02:50
问题 If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths

How do I test OpenCL on GPU when logged in remotely on Mac?

泄露秘密 提交于 2019-12-03 12:12:18
My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, the output is as follows (notice the "GeForce 9400" line): 2 devices found Device #0 name = GeForce 9400 Device #1 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz but if it is just me, over ssh, there is no GeForce 9400 entry: 1 devices found Device #0 name =

What is coherent memory on GPU?

五迷三道 提交于 2019-12-03 11:05:55
问题 I have stumbled not once into a term "non coherent" and "coherent" memory in the tech papers related to graphics programming.I have been searching for a simple and clear explanation,but found mostly 'hardcore' papers of this type.I would be glad to receive layman's style answer on what coherent memory actually is on GPU architectures and how it is compared to other (probably not-coherent) memory types. 回答1: Memory is memory. But different things can access that memory. The GPU can access

Run C# code on GPU

半腔热情 提交于 2019-12-03 08:14:18
问题 I have no knowledge of GPU programming concepts and APIs. I have a few questions: Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)? Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e

What do work items execute when conditionals are used in GPU programming?

空扰寡人 提交于 2019-12-03 03:23:40
If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths?). Why is this the case and how does it not mess up the program execution talonmies NVIDIA gpus use

Run C# code on GPU

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-02 21:52:13
I have no knowledge of GPU programming concepts and APIs. I have a few questions: Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)? Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e. a common API)? Are there any best practices if one wants to develop applications that run on a CPU,

Using CUDA Profiler nvprof for memory accesses

感情迁移 提交于 2019-12-02 13:20:53
I'm using nvprof to get the number of global memory accesses for the following CUDA code. The number of loads in the kernel is 36 (accessing d_In array) and the number of stores in the kernel is 36+36 (for accessing d_Out array and d_rows array). So, the total number of global memory loads is 36 and the number of global memory stores is 72. However, when I profile the code with nvprof CUDA profiler, it reports the following: (Basically I want to compute the Compute to Global Memory Access (CGMA) ratio) 1 gld_transactions Global Load Transactions 6 6 6 1 gst_transactions Global Store

How to use hadoop MapReuce framework for an Opencl application?

白昼怎懂夜的黑 提交于 2019-12-02 10:28:46
I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce framework and I have divided my basic algorithm into two parts i.e. 'Map' and 'Reduce'. I have never worked in hadoop before so I have some questions: Do I have write my application in java only to use Hadoop and Mapeduce framework? I have written kernel functions for map and reduce in opencl. Is it possible to use HDFS a file system for a non java GPU