opencl | 易学教程

clCreateImage2D - Load RGB image

阅读更多关于 clCreateImage2D - Load RGB image

问题 I'm trying to load an OpenCV image ( IplImage ) into GPU with clCreateImage2D . Reason of using IplImage is, i want to load any kind of image.(jpg, bmp, png). I can load image using clCreateImage2D with CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR mem_flags, and CL_RGB and CL_UNORM_SHORT_565 type. But in kernel function read_imagef function is not accept CL_UNORM_SHORT_565 type. So, how can i send RGB image to OpenCL kernel function? Edit: I converted the input image to 32bit. Bu tnow what would

Where is the buffer allocated in opencl?

阅读更多关于 Where is the buffer allocated in opencl?

问题 I was trying to create a memory buffer in OpenCL with C++ binding. The sentence looks like cl::Buffer buffer(context,CL_MEM_READ_ONLY,sizeof(float)*(100)); This sentence confuses me because it doesn't specify which device the memory is allocated on. In principle context contains all devices, including cpu and gpu, on the chosen platform. Is it true that the buffer is put in a common region shared by all the devices? 回答1: The spec does not define where the memory is. For the API user, it is

Gaussian distributed random numbers in OpenCL

阅读更多关于 Gaussian distributed random numbers in OpenCL

问题 I am running a computational expensive task on the GPU using OpenCL. This task requires many random numbers generated within each worker. Some of those random numbers are supposed to be uniformly generated within a certain interval, but some others have to be gaussian distributed around a (changing) value. Is there any library for this? If not, what's an easy way to implement such a thing? So far I have been creating the random numbers in python and have them passed to OpenCL. However the

如何在windows上开心的编译阿里的MNN

阅读更多关于如何在windows上开心的编译阿里的MNN

目前深度学习在终端部署上很多高质量的开源框架，例如，百度的PaddlePaddle-lite，阿里的MNN，腾讯的ncnn。不过看了很多评测，我最终选择了阿里的MNN进行学习。但是，对于阿里的提供的相应的工具链并不是特别满意。我估计最纠结的是那些使用windows的用户吧。（强烈建议官方使用统一脚本语言实现对应的辅助脚本。）这篇文章，我将介绍如何修改源码开开心心的在windows进行编译mnn（其实，修改后不用做任何操作，也是可以适用于其它操作系统的）。另外，请参考原文档： https://www.yuque.com/mnn/cn/build_windows 进行教程前，请先确认自己的操作系统有至少有以下环境： Microsoft Visual Studio （2017或以上）注，笔者这里是使用的vs2015，需将3rd_party/flatbuffers/CMakeLists.txt里的/WX 换成/WX- cmake（建议使用3.10或以上版本） android sdk(如果需要在win上编译android sdk的话) 1. 首先，在schema下，使用python实现了一份generate.py代码。注，我是使用python3.x。 #-*-coding:utf-8-*- #coding by: yuangu(lifulinghan@aol.com) import os

Conversion of YUV data into Image format Opencl

阅读更多关于 Conversion of YUV data into Image format Opencl

问题 I have been working on a project where I use YUV as an input and have to pass this information to the Kernel in order to process the function. I had looked into similar questions but never found an accurate answer to my concern. I have tried a simple method to convert the YUV into an Image format for Opencl Processing. However, when I try to print the data which has been converted into the image I get first value correct then another three as zeroes and then I get the 5th pixel value correct.

Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

阅读更多关于 Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

I am using Compute Prof 3.2 and a Geforce GTX 280. I have compute capability 1.3 then I believe. This file , seems to show that I should be able to see these fields since I am using a 1.x compute device. Well I don't see them and the User Guide for 3.2 toolkit says I can't see them, but calls them gst_uncoalesced and gst_coalesced . To sum up, I am confused about how I should figure out from the profiler if I am making non-coalesced reads from global memory. It doesn't look like Fermi cards will say either, but I am not worried about them for now. If anybody can elaborate on the situation I

Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

阅读更多关于 Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

问题 I am using Compute Prof 3.2 and a Geforce GTX 280. I have compute capability 1.3 then I believe. This file, seems to show that I should be able to see these fields since I am using a 1.x compute device. Well I don't see them and the User Guide for 3.2 toolkit says I can't see them, but calls them gst_uncoalesced and gst_coalesced . To sum up, I am confused about how I should figure out from the profiler if I am making non-coalesced reads from global memory. It doesn't look like Fermi cards

SIMD intrinsics - are they usable on gpus?

阅读更多关于 SIMD intrinsics - are they usable on gpus?

问题 I'm wondering if I can use SIMD intrinsics in a GPU code like a CUDA's kernel or openCL one. Is that possible? 回答1: No, SIMD intrinsics are just tiny wrappers for ASM code. They are CPU specific. More about them here. Generally speking, why whould you do that? CUDA and OpenCL already contain many "functions" which are actually "GPU intrinsics" (all of these, for example, are single-point-math intrinsics for the GPU) 回答2: You use the vector data types built into the OpenCL C language. For

OpenCL Simple “Hello World!” program compiles correctly but spits out garbage when executed

阅读更多关于 OpenCL Simple “Hello World!” program compiles correctly but spits out garbage when executed

问题 As the title suggests, I have copied verbatim the hello.cl and hello.c files from Fixstar's online OpenCL book, at http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/first-opencl-program.html, and cannot get correct output. I compile the program using gcc -lOpenCL hello.c -o hello . I execute normally with ./hello . But my output reads something like ��. I run Arch Linux and have installed OpenCL, the headers, and the NVIDIA implementation. I would like to continue learning OpenCL

opencl image2d_t doesn't write back values

阅读更多关于 opencl image2d_t doesn't write back values

问题 Windows 7 AMD App SDK 2.6 Asic: Redwood I am trying to write a simple pass-thru kernel to see what the issue is and I can't seem to find what the error might be. void kernel_test(CLManager* clMgr, int W, int H) { cl::ImageFormat format; format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_FLOAT; cl_float4* inp = new cl_float4[W * H]; for (int i = 0; i < W * H; ++i) { inp[i].s[0] = 1.0f; inp[i].s[1] = 0.0f; inp[i].s[2] = 0.0f; inp[i].s[3] = 1.0f; } cl_float4* oup = new cl