opencl | 易学教程

clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

阅读更多关于 clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

问题 On Nvidia GPUs, when I call clEnqueueNDRange , the program waits for it to finish before continuing. More precisely, I'm calling its equivalent C++ binding, CommandQueue::enqueueNDRange , but this shouldn't make a difference. This only happens on Nvidia hardware (3 Tesla M2090s) remotely; on our office workstations with AMD GPUs, the call is nonblocking and returns immediately. I don't have local Nvidia hardware to test on - we used to, and I remember similar behavior then, too, but it's a

Does Android support OpenCL?

阅读更多关于 Does Android support OpenCL?

Recently I want to develop the parallel computing application on android use OpenCL. As far as I know, Android system does not include "libopencl.so",but there are still some webs or blogs show OpenCL development on android. Does Android support OpenCL? if so, what should I do to develop OpenCL on android ? Update on May 20, 2016 For all devices with arm64-v8a ABI, the OpenCL library may located in lib64 folder as well. So when you check the OpenCL library, make sure you also check the corresponding lib64 folder (if you prefer arm64-v8a as the first ABI for your app, you may want to first

How to use clCreateProgramWithBinary in OpenCL?

阅读更多关于 How to use clCreateProgramWithBinary in OpenCL?

问题 I'm trying to just get a basic program to work using clCreateProgramWithBinary. This is so I know how to use it rather than a "true" application. I see that one of the parameters is a list of binaries. How exactly would I go about creating a binary to test with? I have some test code which creates a program from source, builds and enqueues it. Is there a binary created at some point during this process which I can feed into clCreateProgramWithBinary? Here is some of my code, just to give an

OpenCL code 'Error MSB3721' for Intel OpenCL SDK on Visual Studio 2010

阅读更多关于 OpenCL code 'Error MSB3721' for Intel OpenCL SDK on Visual Studio 2010

问题 I am currently using Intel's OpenCL SDK platform for heterogeneous parallel programming (OpenCL). I am using Visual Studio 2010 Ultimate for this. My system doesn't have any GPU in it. I have worked on CUDA SDK platform for opencl programming. This the first time I am using Intel's OpenCL SDK for opencl programming. I have tried some basic platform, device, context identifying/creating/defining codes from 'OpenCL in Action' book. They all worked fine. So we can consider that visual studio is

OpenCL: query number of processing elements

阅读更多关于 OpenCL: query number of processing elements

问题 Is it possible to query the number of processing elements (per compute unit) in OpenCL? If yes, how? I did not find a corresponding parameter on the clGetDeviceInfo doc page. I am not sure if processing element is standard terminology. I got the term from this video. I'd like to query this information because I am curious, not for a practical purpose. 回答1: Processing element (PE) is the standard terminology and no you cannot query the number. Now I see some reasons why it's not possible: The

Are either the IPad or IPhone capable of OpenCL?

阅读更多关于 Are either the IPad or IPhone capable of OpenCL?

With the push towards multimedia enabled mobile devices this seems like a logical way to boost performance on these platforms, while keeping general purpose software power efficient. I've been interested in the IPad hardware as a developement platform for UI and data display / entry usage. But am curious of how much processing capability the device itself is capable of. OpenCL would make it a JUICY hardware platform to develop on, even though the licensing seems like it kinda stinks. OpenCL is not yet part of iOS. However, the newer iPhones, iPod touches, and the iPad all have GPUs that

Convenient way to show OpenCL error codes?

阅读更多关于 Convenient way to show OpenCL error codes?

问题 As per title, is there a convenient way to show readable OpenCL error codes? Being able to convert codes like '-1000' to a name would save a lot of time browsing through error codes. 回答1: This is what I currently do. I believe the error list to be complete for OpenCL 1.2. cl_int result = clSomeFunction(); if(result != CL_SUCCESS) std::cerr << getErrorString(result) << std::endl; And getErrorString defined as follows: const char *getErrorString(cl_int error) { switch(error){ // run-time and

C# Rendering OpenCL-generated image

阅读更多关于 C# Rendering OpenCL-generated image

问题 Problem: I'm trying to render a dynamic Julia fractal in real time. Because the fractal is constantly changing, I need to be able to render at least 20 frames per second, preferably more. What you need to know about a Julia fractal is that every pixel can be calculated independently, so the task is easy parallelizable. First approach: Because I'm already used to Monogame in C#, I tried writing a shader in HLSL that would do the job, but the compiler kept complaining because I used up more

并行程序

阅读更多关于并行程序

---恢复内容开始--- 一、大数据时代的现状据统计，YouTube上每分钟就会增加500多小时的视频，面对如此海量的数据，如何高效的存储与处理它们就成了当前最大的挑战。但在这个对硬件要求越来越高的时代，CPU却似乎并不这么给力了。自2013年以来，处理器频率的增长速度逐渐放缓了，目前CPU的频率主要分布在3~4GHz。实际上CPU与频率是于能耗密切相关的，我们之前可以通过加电压来提升频率，但当能耗太大，散热问题就无法解决了，所以频率就逐渐稳定下来了，而Intel与AMD等大制造商也将目标转向了多核芯片，目前普通桌面PC也达到了4~8核。随着集成电路上的晶体管数据量越来越多，功耗的增加以及过热问题，使得在集成电路上增加更多的晶体管变得更加困难，摩尔定律所预言的指数增长必定放缓。因此，摩尔定律失效。当前和未来五年，微处理器技术朝着多核方向发展，充分利用摩尔定律带来的芯片面积，放置多个微处理器内核，以及采用更加先进的技术降低功耗。当然，多核并行计算不仅仅可以使用 CPU ，而且还可以使用 GPU （图形处理器），一个 GPU 有多大上千个核心，可以同时运行上千个线程。那怎么利用 GPU 做并行计算呢？可以使用英伟达的 CUDA 库。什么时候用并行计算? 多核CPU——计算密集型任务: 尽量使用并行计算，可以提高任务执行效率。计算密集型任务会持续地将CPU占满

Writing to global or local memory increases kernel execution time by 10000 %

阅读更多关于 Writing to global or local memory increases kernel execution time by 10000 %

I have the following OpenCL kernel: kernel void ndft( global float *re, global float *im, int num_values, global float *spectrum_re, global float *spectrum_im, global float *spectrum_abs, global float *sin_array, global float *cos_array, float sqrt_num_values_reciprocal) { // MATH MAGIC - DISREGARD FROM HERE ----------- float x; float y; float sum_re = 0; float sum_im = 0; size_t thread_id = get_global_id(0); //size_t local_id = get_local_id(0); // num_values = 24 (live environment), 48 (test) for (int i = 0; i < num_values; i++) { x = cos_array[thread_id * num_values + i] * sqrt_num_values