opencl

std::vector and c-style arrays

我与影子孤独终老i 提交于 2019-12-09 10:32:30
问题 I am experimenting with OpenCL to increase the speed of our software. We work with maps a lot and, to simplify, represent a map as a std::vector< std::vector >. The OpenCL API takes raw c-style pointers as arguments, for example int* in the case above. My questions: Are there implementation guarantees in the stl that vector is, internally, consecutive in memory? Can I safely cast a std::vector to int* and expect that to work? In the case of a vector of vectors, can I still assume this holds

How to debug OpenCL on Nvidia GPUs?

荒凉一梦 提交于 2019-12-09 09:51:46
问题 Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices. 回答1: gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel.

Using async_work_group_copy with a custom data type

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-09 08:04:19
I need to copy some data from __global to __local in openCL using async_work_group_copy. The issue is, I'm not using a built-in data type. The code snip of what I have tried is as follows: typedef struct Y { ... } Y; typedef struct X { Y y[MAXSIZE]; } X; kernel void krnl(global X* restrict x){ global const Y* l = x[a].y; local Y* l2; size_t sol2 = sizeof(l); async_work_group_copy(l2, l, sol2, 0); } where 'a' is just a vector of int. This code does not work, specifically because the gen_type is not a built-in one. The specs (1.2) says: We use the generic type name gentype to indicate the built

Understanding the usage of OpenCL in OpenCV (Mat/ Umat Objects)

情到浓时终转凉″ 提交于 2019-12-09 07:08:47
问题 I ran the code below to check for the performance difference between GPU and CPU usage. I am calculating the Average time for cv::cvtColor() function. I make four function calls: Just_mat() (Without using OpenCL for Mat object) Just_UMat() (Without using OpenCL for Umat object) OpenCL_Mat() (using OpenCL for Mat object) OpenCL_UMat() (using OpenCL for UMat object) for both CPU and GPU. I did not find a huge performance difference between GPU and CPU usage. int main(int argc, char* argv[]) {

Opencl integration with Android

白昼怎懂夜的黑 提交于 2019-12-09 06:54:59
问题 I have searched a lot on google but I am unable to find a good documentation about integrating OpenCl with Android. I referred this link: https://aplacetogeek.wordpress.com/android-with-opencl-tutorial/ But this seems incomplete. Is anyone aware of how to go about doing things with OpenCl in Android? Also, example working code if any is also appreciated. I want to learn about it. 回答1: The similar questions have been asked before, I suggest you read the following pages first: How to use OpenCL

Matrix inversion in OpenCL

人盡茶涼 提交于 2019-12-09 05:46:35
问题 I am trying to accelerate some computations using OpenCL and part of the algorithm consists of inverting a matrix. Is there any open-source library or freely available code to compute lu factorization (lapack dgetrf and dgetri) of matrix or general inversion written in OpenCL or CUDA? The matrix is real and square but doesn't have any other special properties besides that. So far, I've managed to find only basic blas matrix-vector operations implementations on gpu. The matrix is rather small,

Untrusted GPGPU code (OpenCL etc) - is it safe? What risks?

我只是一个虾纸丫 提交于 2019-12-09 04:48:35
问题 There are many approaches when it goes about running untrusted code on typical CPU : sandboxes, fake-roots, virtualization... What about untrusted code for GPGPU (OpenCL,cuda or already compiled one) ? Assuming that memory on graphics card is cleared before running such third-party untrusted code, are there any security risks? What kind of risks? Any way to prevent them ? Is sandboxing possible / available on gpgpu ? maybe binary instrumentation? other techniques? P.S. I am more interested in

Fermi L2 cache hit latency?

独自空忆成欢 提交于 2019-12-09 00:57:39
问题 Does anyone know related information about L2 cache in Fermi? I have heard that it is as slow as global memory, and the use of L2 is just to enlarge the memory bandwidth. But I can't find any official source to confirm this. Did anyone measure the hit latency of L2? What about size, line size, and other paramters? In effect, how do L2 read misses affect the performance? In my sense, L2 only has a meaning in very memory-bound applications. Please feel free to give your opinions. Thanks 回答1:

opencl image2d_t doesn't write back values

断了今生、忘了曾经 提交于 2019-12-09 00:41:19
Windows 7 AMD App SDK 2.6 Asic: Redwood I am trying to write a simple pass-thru kernel to see what the issue is and I can't seem to find what the error might be. void kernel_test(CLManager* clMgr, int W, int H) { cl::ImageFormat format; format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_FLOAT; cl_float4* inp = new cl_float4[W * H]; for (int i = 0; i < W * H; ++i) { inp[i].s[0] = 1.0f; inp[i].s[1] = 0.0f; inp[i].s[2] = 0.0f; inp[i].s[3] = 1.0f; } cl_float4* oup = new cl_float4[W * H]; cl::Image2D clInputImage = clMgr->createImage<cl::Image2D>(CL_MEM_READ_ONLY, format, W,

OpenCL Floating point precision

痞子三分冷 提交于 2019-12-08 23:35:34
问题 I found a problem with host - client float standard in OpenCL. The problem was that the floating points calculated by Opencl is not in the same floating point limits as my visual studio 2010 compiler, when compiling in x86. However when compiling in x64 they are in the same limit. I know it has to be something with, http://www.viva64.com/en/b/0074/ The source I used during testing was: http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism When i ran the program in x86