opencl

What is the algorithm to determine optimal work group size and number of workgroup

♀尐吖头ヾ 提交于 2019-11-27 13:43:20
OpenCL standard defines the following options to get info about device and compiled kernel: CL_DEVICE_MAX_COMPUTE_UNITS CL_DEVICE_MAX_WORK_GROUP_SIZE CL_KERNEL_WORK_GROUP_SIZE CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE Given this values, how can I calculate the optimal size of work group and number of work groups? You discover these values experimentally for your algorithm. Use a profiler to get hard numbers. I like to use CL_DEVICE_MAX_COMPUTE_UNITS as the number of work groups, because I often rely on synchronizing work items. I usually run kernels with little branching, so the take the

Is it possible to access hard disk directly from gpu?

若如初见. 提交于 2019-11-27 11:33:40
问题 Is it possible to access hard disk/ flash disk directly from GPU (CUDA/openCL) and load/store content directly from the GPU's memory ? I am trying to avoid copying stuff from disk to memory and then copying it over to GPU's memory. I read about Nvidia GPUDirect but not sure if it does what I explained above. It talks about remote GPU memory and disks but the disks in my case are local to the GPU. Basic idea is to load contents (something like dma) -> do some operations -> store contents back

Why did Google choose RenderScript instead of OpenCL [closed]

泄露秘密 提交于 2019-11-27 11:19:12
I've been wondering if it was possible to use OpenCL for Android, find out that it wasn't possible, and dropped the subject altogether. But thanks to the blog post from january 14th on the official Android Developer blog (http://android-developers.blogspot.fr/2013/01/evolution-of-renderscript-performance.html), I discovered that parallel programming was possible since Android 4.0, thanks to RenderScript ! An API that has quite a few common features with OpenCL. What I'm wondering now is : why did Google choose to implement this new solution, instead of pushing OpenCL forward (an open

How to set up Xcode to run OpenCL code, and how to verify the kernels before building

不羁的心 提交于 2019-11-27 10:36:07
问题 I am looking at the official documentation on the Apple site, and I see that there is a quickstart about how to use OpenCL on Xcode. Maybe it is just me, but I had no luck building the code that is mentioned on the "hello world OCL" section. I've started Xcode and created an empty project; created a main.c and a .cl kernel file, pasting what is on the Apple developer site, and I am not able to get anything to build, even after adding a target. The AD site does not have a project to download,

Questions about global and local work size

坚强是说给别人听的谎言 提交于 2019-11-27 09:16:35
问题 Searching through the NVIDIA forums I found these questions, which are also of interest to me, but nobody had answered them in the last four days or so. Can you help? Original Forum Post Digging into OpenCL reading tutorials some things stayed unclear for me. Here is a collection of my questions regarding local and global work sizes. Must the global_work_size be smaller than CL_DEVICE_MAX_WORK_ITEM_SIZES ? On my machine CL_DEVICE_MAX_WORK_ITEM_SIZES = 512, 512, 64. Is CL_KERNEL_WORK_GROUP

Using Keras & Tensorflow with AMD GPU

送分小仙女□ 提交于 2019-11-27 09:05:37
问题 I'm starting to learn Keras, which I believe is a layer on top of Tensorflow and Theano. However, I only have access to AMD GPUs such as the AMD R9 280X. How can I setup my Python environment such that I can make use of my AMD GPUs through Keras/Tensorflow support for OpenCL? I'm running on OSX. 回答1: I'm writing an OpenCL 1.2 backend for Tensorflow at https://github.com/hughperkins/tensorflow-cl This fork of tensorflow for OpenCL has the following characteristics: it targets any/all OpenCL 1

Estimate OpenCL Register Use

雨燕双飞 提交于 2019-11-27 08:42:41
问题 Is there a rule of thumb for keeping the compiler happy when it looks at a kernel and assigns registers? The compiler has a lot of flexibility, but I worry that it might start using excessive local memory if I created like, 500 variables in my kernel... or a very long single line with a ton of operations. I know the only way my program could really examine register use on a specific device is by using the AMD SDK or the NVIDIA SDK (or comparing the assembly code to the Device's architecture).

How to compile OpenCL on Ubuntu?

为君一笑 提交于 2019-11-27 08:27:29
Question: What is needed headers and drivers are needed and where would I get them for compiling open CL on ubuntu using gcc/g++? Info: for a while now I've been stumbling around trying to figure out how to install open CL on my desktop and if possible my netbook. There are a couple tutorials out there that I've tried but none seem to work. Also, they all just give a step by step with out really explaining why for the what, or even worse they are specific to a particular IDE so you have to learn the IDE to be able to do anything. So I have an NVIDA GX465 in my desktop and integrated graphics

How do I determine available device memory in OpenCL?

烂漫一生 提交于 2019-11-27 07:46:48
问题 I would like to know how much free memory there is on my device before allocating buffers. Is this possible? I know there's CL_DEVICE_GLOBAL_MEM_SIZE for total memory, and CL_DEVICE_MAX_MEM_ALLOC_SIZE for max size of a single object, but I would like to know the current memory state. As it stands I'm probably going to have to use OpenGL vendor-specific extensions. 回答1: No, there is no way, and there is no need to know it, GPU memory can be virtualized and the driver will swap in/out memory

Error -1001 in clGetPlatformIDs Call !

心已入冬 提交于 2019-11-27 07:35:12
问题 I am trying to start working with OpenCL. I have two NVidia graphics card, I installed "developer driver" as well as SDK from NVidia website. I compiled the demos but when I run ./oclDeviceQuery I see: OpenCL SW Info: Error -1001 in clGetPlatformIDs Call !!! How can I fix it? Does it mean my nvidia cards cannot be detected? I am running Ubuntu 10.10 and X server works properly with nvidia driver. I am pretty sure the problem is not related to file permissions as it doesn't work with sudo