opencl

Building Tensorflow with OpenCL support fails on Ubuntu 18.04

蓝咒 提交于 2019-12-04 20:32:14
While trying to compile Tensorflow on Ubuntu 18.04 with this configuration I'm running into this error: ERROR: /home/joao/Documents/playground/tensorflow/tensorflow/contrib/tensor_forest/hybrid/BUILD:72:1: C++ compilation of rule '//tensorflow/contrib/tensor_forest/hybrid:utils' failed (Exit 1) In file included from tensorflow/contrib/tensor_forest/hybrid/core/ops/utils.cc:15: In file included from ./tensorflow/contrib/tensor_forest/hybrid/core/ops/utils.h:20: In file included from ./tensorflow/core/framework/tensor.h:19: In file included from ./third_party/eigen3/unsupported/Eigen/CXX11

Multiple OpenCl Kernels

大兔子大兔子 提交于 2019-12-04 20:32:13
I just wanted to ask, if somebody can give me a heads up on what to pay attention to when using several simple kernels after each other. Can I use the same CommandQueue ? Can I just run several times clCreateProgramWithSource + cl_program with a different cl_program ? What did I forget? Thanks! Grizzly You can either create and compile several programs (and create kernel objects from those), or you can put all kernels into the same program ( clCreateProgramWithSource takes several strings after all) and create all your kernels from that one. Either should work fine using the same CommandQueue

OpenCL clBuildProgram caches source, and does not recompile if #include'd source changes

早过忘川 提交于 2019-12-04 20:08:34
问题 I have implemented a project with opencl. I have a file which contains the kernel function and the functions which are used by the kernel are included in a seperate header file but when I change the file which is included, sometimes the changes are applied and sometimes they are not and it makes me confused if the application has bug or not. I checked the other posts in stackoverflow and see nvidia has serious problem with passing -I{include directory} , so I changed it and give the header

Is there any guarantee that all of threads in WaveFront (OpenCL) always synchronized?

雨燕双飞 提交于 2019-12-04 19:52:40
As known, there are WARP (in CUDA) and WaveFront (in OpenCL): http://courses.cs.washington.edu/courses/cse471/13sp/lectures/GPUsStudents.pdf WARP in CUDA: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture 4.1. SIMT Architecture ... A warp executes one common instruction at a time , so full efficiency is realized when all 32 threads of a warp agree on their execution path. If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete

Is it necessary to enqueue read/write when using CL_MEM_USE_HOST_PTR?

我的未来我决定 提交于 2019-12-04 19:43:25
Assume that I am wait() ing for the kernel to compute the work. I was wondering if, when allocating a buffer using the CL_MEM_USE_HOST_PTR flag, it is necessary to use enqueueRead/Write on the buffer, or they can always be omitted? Note I am aware of this note on the reference: Calling clEnqueueReadBuffer to read a region of the buffer object with the ptr argument value set to host_ptr + offset, where host_ptr is a pointer to the memory region specified when the buffer object being read is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in order to avoid undefined

OpenCL LLVM IR generation from Clang

邮差的信 提交于 2019-12-04 19:23:48
I am using the following command line for clang: clang -Dcl_clang_storage_class_specifiers -isystem $LIBCLC/generic/include -include clc/clc.h -target nvptx--nvidiacl -x cl some_kernel.cl -emit-llvm -S -o some_kernel.ll the result is: ; ModuleID = 'kernel.cl' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx--nvidiacl" ; Function Attrs: noinline nounwind define void @vector_add(float addrspace(1)* nocapture %vec1, float addrspace(1)* nocapture %vec2, float addrspace(1)*

How do I test OpenCL on GPU when logged in remotely on Mac?

怎甘沉沦 提交于 2019-12-04 18:36:24
问题 My OpenCL program can find the GPU device when I am logged in at the console, but not when I am logged in remotely with ssh. Further, if I run the program as root in the ssh session, the program can find the GPU. The computer is a Snow Leopard Mac with a GeForce 9400 GPU. If I run the program (see below) from the console or as root, the output is as follows (notice the "GeForce 9400" line): 2 devices found Device #0 name = GeForce 9400 Device #1 name = Intel(R) Core(TM)2 Duo CPU P8700 @ 2

Dynamic global memory allocation in opencl kernel

梦想与她 提交于 2019-12-04 18:10:27
Is it possible to dynamically allocate global memory from the kernel? In CUDA it is possible but I would like to know if this is also possible in OpenCL on Intel GPUs. for example: __kernel void foo() { , , , call malloc or clCreateBuffer here } is it possible? If yes how exactly? No, this is not currently allowed in OpenCL. You could implement your own heap by creating one very large buffer up front, and then 'allocate' regions of the buffer by handing out offsets (using atomic_add to avoid synchronisation issues). However, in most cases I suspect it would be better just to rethink your

Realistic deadlock example in CUDA/OpenCL

时光总嘲笑我的痴心妄想 提交于 2019-12-04 16:50:54
For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD. I came up with this snippet, which seems to be a good example. Any input would be appreciated. … int x = threadID / 2; if (threadID > x) { value[threadID] = 42; barrier(); } else { value2[threadID/2] = 13 barrier(); } result = value[threadID/2] + value2[threadID/2]; I know, it is neither proper CUDA C nor OpenCL C. A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that

need to convert C++ template to C99 code

谁都会走 提交于 2019-12-04 14:57:39
I am porting CUDA code to OpenCL - CUDA allows C++ constructs like templates while OpenCL is strictly C99. So, what is the most painless way of porting templatest to C? I thought of using function pointers for the template parameters. Before there were templates, there were preprocessor macros. Search the web for "generic programming in C" for inspiration. Here is the technique I used for conversion of some of CUDA algorithms from Modern GPU code to my GPGPU VexCL library (with OpenCL support). Each template function in CUDA code is converted to two template functions in OpenCL host code. The