opencl

OpenCL: Manually throw an exception in kernel

蓝咒 提交于 2019-12-10 16:09:56
问题 Is it possible to manually throw an exception in OpenCL, just for debugging purposes? I am having a very strange error in my code: when I computed two double values and add them up, the host reports "CL_OUT_OF_RESOURCE". However if I don't add these two values, the host doesn't report any error. 回答1: Exceptions are not supported in OpenCL - it is based on the C99 language. On AMD GPUs you can use printf inside the kernel - see the cl_amd_printf extension. To use, put this at the top of your

How to pass an array in a structure to the kernel?

限于喜欢 提交于 2019-12-10 15:36:35
问题 I need to output to Kernel an array of structures in which there will be an array. But in the end, the data is a little correct, but in some way the error. I have this code on Host struct myStruct { int a; double b; double c[5]; }; myStruct *result = new myStruct[countOptim]; for (int i = 0; i < countOptim; i++) { result[i].a = 5; result[i].b = 11.5; for (int j = 0; j < 5; j++) { result[i].c[j] = j; } } // Make kernel Kernel kernel(program, "vector_add"); Buffer bufferResult = Buffer(context,

Generic OpenCL stencil kernel and host

喜夏-厌秋 提交于 2019-12-10 15:14:25
问题 I am new to OpenCL. I would like to write a generic kernel so later I can extend its use to other memory non-coalescing patterns and pairing this with Rectangular stencil pattern for simplicity (also avoiding out-of-bound access). This kernel controls the use of local memory ( __local float ∗lmem ). As of now, I have structures my .cl file as bellow: __kernel void kmain ( __global float ∗in , __global float ∗out , __global float ∗in2 , __local float ∗lmem) { int wg_x = get group id(0); int wg

What is a host in opencl?

寵の児 提交于 2019-12-10 13:48:48
问题 I have started now to learn openCL. I am doing the tutorial now but I can't really grasp the idea is of host could someone explain.Thank you 回答1: OpenCL is a system designed to support massively parallel processing such as can be performed by modern graphics chips (GPUs). In the OpenCL paradigm, a "host program" is the outer control logic that performs the configuration for a GPU-based application. This host program normally would run on a general purpose CPU (such as the x86-compatible main

AMD vs NVIDIA. How do they differentiate in terms of support of OpenCL?

橙三吉。 提交于 2019-12-10 12:12:30
问题 I have an EC2 instance. It's specs are: g2.2xlarge Instance. Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz NVIDIA GRID GPU (Kepler GK104) with Ubuntu 14.04 - 64 bit. I have two questions: 1. After installing the CUDA toolkit on this system, I have the following output when using clinfo : clinfo: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by clinfo) Platform Version: OpenCL 1.2 CUDA 8.0.46 Platform Name: NVIDIA CUDA Platform Vendor:

OpenCL speed and float point precision

半城伤御伤魂 提交于 2019-12-10 10:22:58
问题 I have just started working with OpenCL. However, I have found some weird behavior of OpenCl, which i can't understand. The source i built and tested, was http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism . I have a ATI Radeon HD 4770, and a AMD Fx 6200 3.8 ghz 6 core cpu. Speed Firstly the speed is not linearly to the number of maximum work group items. I ran App profiler to analyze the time spent during the kernel execution. The result was a bit shocking, my GPU

float VS floatN

天大地大妈咪最大 提交于 2019-12-10 10:10:07
问题 Is there any advantage when using floatN instead float in OpenCL? for example float3 position; and float posX, posY, posZ; Thank you 回答1: It depends on the hardware. NVidia GPUs have a scalar architecture, so vectors provide little advantage on them over writing purely scalar code. Quoting the NVidia OpenCL best practices guide (PDF link): The CUDA architecture is a scalar architecture. Therefore, there is no performance benefit from using vector types and instructions. These should only be

Variable in OpenCL kernel 'for-loop' reduces performance

与世无争的帅哥 提交于 2019-12-10 09:59:03
问题 I have a for-loop in my kernel that I had hard-coded to iterate for a fixed number of loops of my code: for (int kk = 0; kk < 50000; kk++) { <... my code here ...> } I don't think the code in the loop is relevant to my question, it's some pretty simple table look-ups and integer math. I wanted to make my kernel code a little more flexible so I modified the loop so that the number of iterations of my loop (50000) is replaced with a kernel input parameter 'num_loops'. for (int kk = 0; kk < num

如何在windows上开心的编译阿里的MNN

有些话、适合烂在心里 提交于 2019-12-10 06:39:13
目前深度学习在终端部署上很多高质量的开源框架,例如,百度的PaddlePaddle-lite,阿里的MNN,腾讯的ncnn。不过看了很多评测,我最终选择了阿里的MNN进行学习。但是,对于阿里的提供的相应的工具链并不是特别满意。我估计最纠结的是那些使用windows的用户吧。(强烈建议官方使用统一脚本语言实现对应的辅助脚本。)这篇文章,我将介绍如何修改源码开开心心的在windows进行编译mnn(其实,修改后不用做任何操作,也是可以适用于其它操作系统 的)。 另外,请参考原文档: https://www.yuque.com/mnn/cn/build_windows 进行教程前,请先确认自己的操作系统有至少有以下环境: Microsoft Visual Studio (2017或以上)注,笔者这里是使用的vs2015,需将3rd_party/flatbuffers/CMakeLists.txt里的/WX 换成/WX- cmake(建议使用3.10或以上版本) android sdk(如果需要在win上编译android sdk的话) 1. 首先,在schema下,使用python实现了一份generate.py代码。注,我是使用python3.x。 #-*-coding:utf-8-*- #coding by: yuangu(lifulinghan@aol.com) import os

Install OpenCL(AMD SDK kit) on linux without ROOT privilege

流过昼夜 提交于 2019-12-10 06:12:29
问题 I am trying to install OpenCL(AMD) on linux, but I am stuck on the last step(install ICD) It seems like ICD HAS to be installed at /etc/OpenCL/vendor, but I don’t have root access to the computer. Is there any way to make OpenCL work without installing ICD? (or maybe through an environment variable to add search path for ICD files?) It just seems really inconvenient for people like us when ICD file path is hardcoded. 回答1: Put the ICD-files in /some/path/icd and then export the path like so: