gpgpu | 易学教程

Are GPU/CUDA cores SIMD ones?

阅读更多关于 Are GPU/CUDA cores SIMD ones?

Let's take the nVidia Fermi Compute Architecture . It says: The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The 512 CUDA cores are organized in 16 SMs of 32 cores each. [...] Each CUDA processor has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). [...] In Fermi, the newly designed integer ALU supports full 32-bit precision for all instructions, consistent with standard programming language requirements. The integer ALU is

CUDA - why is warp based parallel reduction slower?

阅读更多关于 CUDA - why is warp based parallel reduction slower?

I had the idea about a warp based parallel reduction since all threads of a warp are in sync by definition. So the idea was that the input data can be reduced by factor 64 (each thread reduces two elements) without any synchronization need. Same as the original implementation by Mark Harris the reduction is applied on block-level and data is on shared memory. http://gpgpu.org/static/sc2007/SC07_CUDA_5_Optimization_Harris.pdf I created a kernel to test his version and my warp based version. The kernel itself is completely identically storing BLOCK_SIZE elements in shared memory and outputting

What is coherent memory on GPU?

阅读更多关于 What is coherent memory on GPU?

问题 I have stumbled not once into a term "non coherent" and "coherent" memory in the tech papers related to graphics programming.I have been searching for a simple and clear explanation,but found mostly 'hardcore' papers of this type.I would be glad to receive layman's style answer on what coherent memory actually is on GPU architectures and how it is compared to other (probably not-coherent) memory types. 回答1: Memory is memory. But different things can access that memory. The GPU can access

How should a very simple Makefile look like for Cuda compiling under linux

阅读更多关于 How should a very simple Makefile look like for Cuda compiling under linux

问题 I want to compile a very basic hello world level Cuda program under Linux. I have three files: the kernel: helloWorld.cu main method: helloWorld.cpp common header: helloWorld.h Could you write me a simple Makefile to compile this with nvcc and g++? Thanks, Gabor 回答1: Just in case, here's my variant. I use it to compile CUDA projects on Mac, but I think it will suit Linux too. It requires CUDA SDK. BINDIR = ./ # places compiled binary in current directory EXECUTABLE := helloWorld CCFILES :=

Is it possible to bind a OpenCV GpuMat as an OpenGL texture?

阅读更多关于 Is it possible to bind a OpenCV GpuMat as an OpenGL texture?

问题 I haven't been able to find any reference except for: http://answers.opencv.org/question/9512/how-to-bind-gpumat-to-texture/ which discusses a CUDA approach. Ideally I'd like to update an OpenGL texture with the contents of a cv::gpu::GpuMat without copying back to CPU, and without directly using CUDA (although i presume this may be necessary until this feature is added). 回答1: OpenCV has OpenGL support. See opencv2/core/opengl_interop.hpp header file. You can copy the content of GpuMat to

How many 'CUDA cores' does each multiprocessor of a GPU have?

阅读更多关于 How many 'CUDA cores' does each multiprocessor of a GPU have?

I know that devices before the Fermi architecture had 8 SPs in a single multiprocessor. Is the count same in Fermi architecture? The number of Multiprocessors (MP) and the number of cores per MP can be found by executing DeviceQuery.exe . It is found in the %NVSDKCOMPUTE_ROOT%/C/bin directory of the GPU Computing SDK installation. A look at the code of DeviceQuery (found in %NVSDKCOMPUTE_ROOT%/C/src/DeviceQuery ) reveals that it the number of cores is calculated by passing the x.y CUDA Capability numbers to the ConvertSMVer2Cores utility function. From the code of ConvertSMVer2Cores this

Is it worth offloading FFT computation to an embedded GPU?

阅读更多关于 Is it worth offloading FFT computation to an embedded GPU?

We are considering porting an application from a dedicated digital signal processing chip to run on generic x86 hardware. The application does a lot of Fourier transforms, and from brief research, it appears that FFTs are fairly well suited to computation on a GPU rather than a CPU. For example, this page has some benchmarks with a Core 2 Quad and a GF 8800 GTX that show a 10-fold decrease in calculation time when using the GPU: http://www.cv.nrao.edu/~pdemores/gpu/ However, in our product, size constraints restrict us to small form factors such as PC104 or Mini-ITX, and thus to rather limited

How to obtain OpenCL SDK?

阅读更多关于 How to obtain OpenCL SDK?

问题 I was perusing http://www.khronos.org/ web site and only found headers for OpenCL (not OpenGL which I don't care about). How can I obtain OpenCL SDK? 回答1: AMD's ATI Stream SDK works perfectly for me and it uses multicore cpu's. I have an Intel CPU and an NVIDIA card but it works with using the CPU. Just registration is required and no special selection like Nvidia requires: http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx I got it to work in ubuntu 9.04. Just download

What do work items execute when conditionals are used in GPU programming?

阅读更多关于 What do work items execute when conditionals are used in GPU programming?

If you have work items executing in a wavefront and there is a conditional such as: if(x){ ... } else{ .... } What do the work-items execute? is it the case whereby all workitems in the wavefront will execute the first branch (i.e. x == true ). If there are no work-items for which x is false, then the rest of the conditional is skipped? What happens if one work-item takes the alternative path. Am I told that all workitems will execute the alternate path as well (therefore executing both paths?). Why is this the case and how does it not mess up the program execution talonmies NVIDIA gpus use

NVIDIA vs AMD: GPGPU performance

阅读更多关于 NVIDIA vs AMD: GPGPU performance

问题 I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all). On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today,