gpu-programming | 易学教程

3D Graphics Picking - What is the best approach for this scenario

阅读更多关于 3D Graphics Picking - What is the best approach for this scenario

问题 I am working on a project which allows users to pick 3d objects in a scene and I was wondering what everyone thought would be the best way to approach this particular scenario. Basically we have a scene with at least 100 objects (they are low-poly but made from at least ~12-15 triangles) and up to about 1000-2000 objects. Not all the objects will be "pickable" at all times because some objects will occlude others so "pickable" objects probably land in the range between 800-1500 (depending on

Cholesky decomposition with CUDA

阅读更多关于 Cholesky decomposition with CUDA

问题 I am trying to implement Cholesky decomposition using the cuSOLVER library. I am a beginner CUDA programmer and I have always specified block-sizes and grid-sizes, but I am not able to find out how this can be set explicitly by the programmer with cuSOLVER functions. Here is the documentation: http://docs.nvidia.com/cuda/cusolver/index.html#introduction The QR decomposition is implemented using the cuSOLVER library (see the example here: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr

simple CUDA program execution without GPU hardware using NVIDIA GPU computing SDK 4.0 and microsoft VC++ 2010 express

阅读更多关于 simple CUDA program execution without GPU hardware using NVIDIA GPU computing SDK 4.0 and microsoft VC++ 2010 express

问题 I am new to GPU computing , but somewhere I've read that it's possible to execute a CUDA program without a GPU card using a simulator/ emulator. I have installed NVIDIA's GPU Computing SDK 4.0 and Visual C++ 2010 Express on Windows Vista. I would like to know: Whether it is feasible or not to run CUDA code without a GPU, using NVIDA's Computing SDK 4.0 and Visual C++ 2010 express? Why I get the following error, when I try to execute a sample program I have: ------ Build started: Project:

Handling Ctrl+C exception with GPU

阅读更多关于 Handling Ctrl+C exception with GPU

问题 I am working with some GPU programs (using CUDA 4.1 and C), and sometimes (rarely) I have to kill the program midway using Ctrl+C to handle some exception. Earlier I tried using CudaDeviceReset() function, but this reply by talonmies displaced my trust in CudaDeviceReset() and hence I started handling such exceptions the Old-Fashioned way, that is 'computer restart'. As the project size grows, this method is becoming a headache. I would appreciate if anyone has come up with a better solution.

How to run Python on AMD GPU?

阅读更多关于 How to run Python on AMD GPU?

问题 We are currently trying to optimize a system in which there are at least 12 variables. Total comibination of these variable is over 1 billion. This is not deep learning or machine learning or Tensorflow or whatsoever but arbitrary calculation on time series data. We have implemented our code in Python and successfully run it on CPU. We also tried multiprocessing which also works well but we need faster computation since calculation takes weeks. We have a GPU system consisting of 6 AMD GPUs.

Nsight skips (ignores) over break points in VS10 Cuda works fine, nsight consistently skips over several breakpoints

阅读更多关于 Nsight skips (ignores) over break points in VS10 Cuda works fine, nsight consistently skips over several breakpoints

问题 I'm using nsight 2.2 , Toolkit 4.2 , latest nvidia driver , I'm using couple gpu's in my computer. Build customize 4.2. I have set "generate GPU ouput" on CUDA's project properties, nsight monitor is on (everything looks great). I set several break points on my global - kernel function . nsight stops at the declaration of the function , but skips over several break points. it's just like nsight decide whether to hit a break point or skip over a break point. The funny thing is that nsight

What are the programming languages for GPU

阅读更多关于 What are the programming languages for GPU

问题 I read an article stating that GPU are the future of supercomputing. I would like to know what are the programming languages used for programming on GPU's 回答1: OpenCL is the open and cross platform solution and runs on both GPUs and CPUs. Another is CUDA which is built by NVIDIA for their GPUs. HLSL,Cg are few others 回答2: CUDA has quite a few language ports.. http://en.wikipedia.org/wiki/CUDA 来源： https://stackoverflow.com/questions/4057548/what-are-the-programming-languages-for-gpu

How to set the right alignment for an OpenCL array of structs?

阅读更多关于 How to set the right alignment for an OpenCL array of structs?

问题 I have the following structure: C++: struct ss{ cl_float3 pos; cl_float value; cl_bool moved; cl_bool nextMoved; cl_int movePriority; cl_int nextMovePriority; cl_float value2; cl_float value3; cl_int neighbors[6]; cl_float3 offsets[6]; cl_float off1[6]; cl_float off2[6]; }; OpenCL: typedef struct{ float3 nextPos; float value; bool moved; bool nextMoved; int movePriority; int nextMovePriority; float value2; float value3; int neighbors[6]; float3 offsets[6]; float off1[6]; float off2[6]; } ss;

How to configure OpenCL in visual studio2010 for nvidia's gpu on windows?

阅读更多关于 How to configure OpenCL in visual studio2010 for nvidia's gpu on windows?

问题 I am using NVIDIA's GeForce GTX 480 GPU on Wwindows 7 operating system on my ASUS laptop. I have already configured Visual Studio 2010 for CUDA 4.2. How to configure OpenCL for nvidia's gpu on visual studio 2010?? Have tries every possible way. Is it possible by any way to use 'CUDA toolkit (CUDA 4.2)' and 'nvidia's gpu computing sdk' to program OpenCL? If yes then How? If no then what is other way? 回答1: Yes. You should be able to use Visual Studio 2010 to program for OpenCL. It should simply

How to determine maximum batch size for a seq2seq tensorflow RNN training model

阅读更多关于 How to determine maximum batch size for a seq2seq tensorflow RNN training model

问题 Currently, I am using the default 64 as the batch size for the seq2seq tensorflow model. What is the maximum batch size , layer size etc I can go with a single Titan X GPU with 12 GB RAM with Haswell-E xeon 128GB RAM. The input data is converted to embeddings. Following are some helpful parameters I am using , it seems the cell input size is 1024: encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size]. tf.app