gpu-programming | 易学教程

Hash table implementation for GPU [closed]

阅读更多关于 Hash table implementation for GPU [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am looking for a hash table implementation that I can use for CUDA coding. are there any good one's out there. Something like the Python dictionary . I will use strings as my keys 回答1: Alcantara et al have demonstrated a data-parallel algorithm for building hash tables on the GPU. I believe the implementation

How to determine maximum batch size for a seq2seq tensorflow RNN training model

阅读更多关于 How to determine maximum batch size for a seq2seq tensorflow RNN training model

Currently, I am using the default 64 as the batch size for the seq2seq tensorflow model. What is the maximum batch size , layer size etc I can go with a single Titan X GPU with 12 GB RAM with Haswell-E xeon 128GB RAM. The input data is converted to embeddings. Following are some helpful parameters I am using , it seems the cell input size is 1024: encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. tf.app.flags.DEFINE_integer("size", 1024, "Size of each model layer.") So based on my hardware what is the

What do work items execute when conditionals are used in GPU programming?

阅读更多关于 What do work items execute when conditionals are used in GPU programming?

问题 If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths

How do I test OpenCL on GPU when logged in remotely on Mac?

阅读更多关于 How do I test OpenCL on GPU when logged in remotely on Mac?

My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, the output is as follows (notice the "GeForce 9400" line): 2 devices found Device #0 name = GeForce 9400 Device #1 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz but if it is just me, over ssh, there is no GeForce 9400 entry: 1 devices found Device #0 name =

What is coherent memory on GPU?

阅读更多关于 What is coherent memory on GPU?

问题 I have stumbled not once into a term "non coherent" and "coherent" memory in the tech papers related to graphics programming.I have been searching for a simple and clear explanation,but found mostly 'hardcore' papers of this type.I would be glad to receive layman's style answer on what coherent memory actually is on GPU architectures and how it is compared to other (probably not-coherent) memory types. 回答1: Memory is memory. But different things can access that memory. The GPU can access

Run C# code on GPU

阅读更多关于 Run C# code on GPU

问题 I have no knowledge of GPU programming concepts and APIs. I have a few questions: Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)? Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e

What do work items execute when conditionals are used in GPU programming?

阅读更多关于 What do work items execute when conditionals are used in GPU programming?

If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths?). Why is this the case and how does it not mess up the program execution talonmies NVIDIA gpus use

Run C# code on GPU

阅读更多关于 Run C# code on GPU

I have no knowledge of GPU programming concepts and APIs. I have a few questions: Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)? Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e. a common API)? Are there any best practices if one wants to develop applications that run on a CPU,

Using CUDA Profiler nvprof for memory accesses

阅读更多关于 Using CUDA Profiler nvprof for memory accesses

I'm using nvprof to get the number of global memory accesses for the following CUDA code. The number of loads in the kernel is 36 (accessing d_In array) and the number of stores in the kernel is 36+36 (for accessing d_Out array and d_rows array). So, the total number of global memory loads is 36 and the number of global memory stores is 72. However, when I profile the code with nvprof CUDA profiler, it reports the following: (Basically I want to compute the Compute to Global Memory Access (CGMA) ratio) 1 gld_transactions Global Load Transactions 6 6 6 1 gst_transactions Global Store

How to use hadoop MapReuce framework for an Opencl application?

阅读更多关于 How to use hadoop MapReuce framework for an Opencl application?

I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce framework and I have divided my basic algorithm into two parts i.e. 'Map' and 'Reduce'. I have never worked in hadoop before so I have some questions: Do I have write my application in java only to use Hadoop and Mapeduce framework? I have written kernel functions for map and reduce in opencl. Is it possible to use HDFS a file system for a non java GPU