opencl | 易学教程

Building Tensorflow with OpenCL support fails on Ubuntu 18.04

阅读更多关于 Building Tensorflow with OpenCL support fails on Ubuntu 18.04

While trying to compile Tensorflow on Ubuntu 18.04 with this configuration I'm running into this error: ERROR: /home/joao/Documents/playground/tensorflow/tensorflow/contrib/tensor_forest/hybrid/BUILD:72:1: C++ compilation of rule '//tensorflow/contrib/tensor_forest/hybrid:utils' failed (Exit 1) In file included from tensorflow/contrib/tensor_forest/hybrid/core/ops/utils.cc:15: In file included from ./tensorflow/contrib/tensor_forest/hybrid/core/ops/utils.h:20: In file included from ./tensorflow/core/framework/tensor.h:19: In file included from ./third_party/eigen3/unsupported/Eigen/CXX11

Multiple OpenCl Kernels

阅读更多关于 Multiple OpenCl Kernels

I just wanted to ask, if somebody can give me a heads up on what to pay attention to when using several simple kernels after each other. Can I use the same CommandQueue ? Can I just run several times clCreateProgramWithSource + cl_program with a different cl_program ? What did I forget? Thanks! Grizzly You can either create and compile several programs (and create kernel objects from those), or you can put all kernels into the same program ( clCreateProgramWithSource takes several strings after all) and create all your kernels from that one. Either should work fine using the same CommandQueue

OpenCL clBuildProgram caches source, and does not recompile if #include'd source changes

阅读更多关于 OpenCL clBuildProgram caches source, and does not recompile if #include'd source changes

问题 I have implemented a project with opencl. I have a file which contains the kernel function and the functions which are used by the kernel are included in a seperate header file but when I change the file which is included, sometimes the changes are applied and sometimes they are not and it makes me confused if the application has bug or not. I checked the other posts in stackoverflow and see nvidia has serious problem with passing -I{include directory} , so I changed it and give the header

Is there any guarantee that all of threads in WaveFront (OpenCL) always synchronized?

阅读更多关于 Is there any guarantee that all of threads in WaveFront (OpenCL) always synchronized?

As known, there are WARP (in CUDA) and WaveFront (in OpenCL): http://courses.cs.washington.edu/courses/cse471/13sp/lectures/GPUsStudents.pdf WARP in CUDA: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture 4.1. SIMT Architecture ... A warp executes one common instruction at a time , so full efficiency is realized when all 32 threads of a warp agree on their execution path. If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete

Is it necessary to enqueue read/write when using CL_MEM_USE_HOST_PTR?

阅读更多关于 Is it necessary to enqueue read/write when using CL_MEM_USE_HOST_PTR?

Assume that I am wait() ing for the kernel to compute the work. I was wondering if, when allocating a buffer using the CL_MEM_USE_HOST_PTR flag, it is necessary to use enqueueRead/Write on the buffer, or they can always be omitted? Note I am aware of this note on the reference: Calling clEnqueueReadBuffer to read a region of the buffer object with the ptr argument value set to host_ptr + offset, where host_ptr is a pointer to the memory region specified when the buffer object being read is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in order to avoid undefined

OpenCL LLVM IR generation from Clang

阅读更多关于 OpenCL LLVM IR generation from Clang

I am using the following command line for clang: clang -Dcl_clang_storage_class_specifiers -isystem $LIBCLC/generic/include -include clc/clc.h -target nvptx--nvidiacl -x cl some_kernel.cl -emit-llvm -S -o some_kernel.ll the result is: ; ModuleID = 'kernel.cl' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx--nvidiacl" ; Function Attrs: noinline nounwind define void @vector_add(float addrspace(1)* nocapture %vec1, float addrspace(1)* nocapture %vec2, float addrspace(1)*

How do I test OpenCL on GPU when logged in remotely on Mac?

阅读更多关于 How do I test OpenCL on GPU when logged in remotely on Mac?

问题 My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, the output is as follows (notice the "GeForce 9400" line): 2 devices found Device #0 name = GeForce 9400 Device #1 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2

Dynamic global memory allocation in opencl kernel

阅读更多关于 Dynamic global memory allocation in opencl kernel

Is it possible to dynamically allocate global memory from the kernel? In CUDA it is possible but I would like to know if this is also possible in OpenCL on Intel GPUs. for example: __kernel void foo() { , , , call malloc or clCreateBuffer here } is it possible? If yes how exactly? No, this is not currently allowed in OpenCL. You could implement your own heap by creating one very large buffer up front, and then 'allocate' regions of the buffer by handing out offsets (using atomic_add to avoid synchronisation issues). However, in most cases I suspect it would be better just to rethink your

Realistic deadlock example in CUDA/OpenCL

阅读更多关于 Realistic deadlock example in CUDA/OpenCL

For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD. I came up with this snippet, which seems to be a good example. Any input would be appreciated. … int x = threadID / 2; if (threadID > x) { value[threadID] = 42; barrier(); } else { value2[threadID/2] = 13 barrier(); } result = value[threadID/2] + value2[threadID/2]; I know, it is neither proper CUDA C nor OpenCL C. A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that

need to convert C++ template to C99 code

阅读更多关于 need to convert C++ template to C99 code

I am porting CUDA code to OpenCL - CUDA allows C++ constructs like templates while OpenCL is strictly C99. So, what is the most painless way of porting templatest to C? I thought of using function pointers for the template parameters. Before there were templates, there were preprocessor macros. Search the web for "generic programming in C" for inspiration. Here is the technique I used for conversion of some of CUDA algorithms from Modern GPU code to my GPGPU VexCL library (with OpenCL support). Each template function in CUDA code is converted to two template functions in OpenCL host code. The