opencl

OpenCL CPU Device vs GPU Device

心已入冬 提交于 2019-11-28 23:06:13
问题 Consider a simple example: vector addition. If I build a program for CL_DEVICE_TYPE_GPU, and I build the same program for CL_DEVICE_TYPE_CPU, what is the difference between them(except that "CPU program" is running on CPU, and "GPU program" is running on GPU)? Thanks for your help. 回答1: There are a few differences between the device types. The simple answer to your vector question is: Use a gpu for large vectors, and cpu for smaller workloads. 1) Memory copying. GPUs rely on the data you are

Is it possible to access hard disk directly from gpu?

旧时模样 提交于 2019-11-28 18:24:44
Is it possible to access hard disk/ flash disk directly from GPU (CUDA/openCL) and load/store content directly from the GPU's memory ? I am trying to avoid copying stuff from disk to memory and then copying it over to GPU's memory. I read about Nvidia GPUDirect but not sure if it does what I explained above. It talks about remote GPU memory and disks but the disks in my case are local to the GPU. Basic idea is to load contents (something like dma) -> do some operations -> store contents back to disk (again in dma fashion). I am trying to involve CPU and RAM as little as possible here. Please

How to set up Xcode to run OpenCL code, and how to verify the kernels before building

a 夏天 提交于 2019-11-28 17:19:12
I am looking at the official documentation on the Apple site, and I see that there is a quickstart about how to use OpenCL on Xcode. Maybe it is just me, but I had no luck building the code that is mentioned on the "hello world OCL" section. I've started Xcode and created an empty project; created a main.c and a .cl kernel file, pasting what is on the Apple developer site, and I am not able to get anything to build, even after adding a target. The AD site does not have a project to download, so I have no clue about the cause of the failure (it may be me most likely, or the site assume steps

Questions about global and local work size

蓝咒 提交于 2019-11-28 15:35:23
Searching through the NVIDIA forums I found these questions , which are also of interest to me, but nobody had answered them in the last four days or so. Can you help? Original Forum Post Digging into OpenCL reading tutorials some things stayed unclear for me. Here is a collection of my questions regarding local and global work sizes. Must the global_work_size be smaller than CL_DEVICE_MAX_WORK_ITEM_SIZES ? On my machine CL_DEVICE_MAX_WORK_ITEM_SIZES = 512, 512, 64. Is CL_KERNEL_WORK_GROUP_SIZE the recommended work_group_size for the used kernel? Or is this the only work_group_size the GPU

OpenCL / AMD: Deep Learning [closed]

坚强是说给别人听的谎言 提交于 2019-11-28 15:20:55
While "googl'ing" and doing some research I were not able to find any serious/popular framework/sdk for scientific GPGPU-Computing and OpenCL on AMD hardware. Is there any literature and/or software I missed? Especially I am interested in deep learning . For all I know deeplearning.net recommends NVIDIA hardware and CUDA frameworks. Additionally all big deep learning frameworks I know, such as Caffe , Theano , Torch , DL4J , ... are focussed on CUDA and do not plan to support OpenCL/AMD . Furthermore one can find plenty of scientific papers as well as corresponding literature for CUDA based

Using Keras & Tensorflow with AMD GPU

房东的猫 提交于 2019-11-28 15:16:07
I'm starting to learn Keras, which I believe is a layer on top of Tensorflow and Theano. However, I only have access to AMD GPUs such as the AMD R9 280X. How can I setup my Python environment such that I can make use of my AMD GPUs through Keras/Tensorflow support for OpenCL? I'm running on OSX. I'm writing an OpenCL 1.2 backend for Tensorflow at https://github.com/hughperkins/tensorflow-cl This fork of tensorflow for OpenCL has the following characteristics: it targets any/all OpenCL 1.2 devices. It doesnt need OpenCL 2.0, doesnt need SPIR-V, or SPIR. Doesnt need Shared Virtual Memory. And so

使用OpenCL提升OpenCV图像处理性能 | speed up opencv image processing with OpenCL

为君一笑 提交于 2019-11-28 15:08:37
本文首发于个人博客 https://kezunlin.me/post/59afd8b3/ ,欢迎阅读最新内容! speed up opencv image processing with OpenCL <!--more--> Guide OpenCL is a framework for writing programs that execute on these heterogenous platforms. The developers of an OpenCL library utilize all OpenCL compatible devices (CPUs, GPUs, DSPs, FPGAs etc) they find on a computer / device and assign the right tasks to the right processor. Keep in mind that as a user of OpenCV library you are not developing any OpenCL library. In fact you are not even a user of the OpenCL library because all the details are hidden behind the transparent API

How do I determine available device memory in OpenCL?

ぃ、小莉子 提交于 2019-11-28 13:32:06
I would like to know how much free memory there is on my device before allocating buffers. Is this possible? I know there's CL_DEVICE_GLOBAL_MEM_SIZE for total memory, and CL_DEVICE_MAX_MEM_ALLOC_SIZE for max size of a single object, but I would like to know the current memory state. As it stands I'm probably going to have to use OpenGL vendor-specific extensions. No, there is no way, and there is no need to know it, GPU memory can be virtualized and the driver will swap in/out memory from the GPU when it is/not needed. You can use GL_NVX_gpu_memory_info on nVidia. 来源: https://stackoverflow

Error -1001 in clGetPlatformIDs Call !

我与影子孤独终老i 提交于 2019-11-28 13:17:01
I am trying to start working with OpenCL. I have two NVidia graphics card, I installed "developer driver" as well as SDK from NVidia website. I compiled the demos but when I run ./oclDeviceQuery I see: OpenCL SW Info: Error -1001 in clGetPlatformIDs Call !!! How can I fix it? Does it mean my nvidia cards cannot be detected? I am running Ubuntu 10.10 and X server works properly with nvidia driver. I am pretty sure the problem is not related to file permissions as it doesn't work with sudo either. In my case I have solved it by installing nvidia-modprobe package available in ubuntu (utopic

Measuring execution time of OpenCL kernels

[亡魂溺海] 提交于 2019-11-28 11:12:27
I have the following loop that measures the time of my kernels: double elapsed = 0; cl_ulong time_start, time_end; for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i) { err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, NULL, 0, NULL, &event); checkErr(err, "Kernel run"); err = clWaitForEvents(1, &event); checkErr(err, "Kernel run wait fro event"); err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL); checkErr(err, "Kernel run get time start"); err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time