opencl

How to create OpenCL command queue?

妖精的绣舞 提交于 2019-12-24 20:03:08
问题 I'm trying to learn OpenCL but I can't even make a simple kernel to work. The code below comes from the book "OpenCL Programming by Example", which I modified, modified, modified... and still, I have no clues what's the problem. Every time I execute the program in my PC (AMD Athlon 5350 APU with Radeon R3), it prints the result as "0.0000". If I run the same executable, in my other machine (which is a clone of this HD, so everything is the same) with a NVIDIA 1080 TI, the program outputs "3

OpenCV: Failed to load OpenCL runtime

≡放荡痞女 提交于 2019-12-24 17:31:40
问题 I am running a program in which I got the error on the title of my question. I have found an answer here that suggests to download OpenCV from github, then compile with ENABLE_OPENCL=OFF (using CMake) and use the built libs against the application. Or may be is there a way to set this flag in the program itself without modifying anything in OpenCV ? I wonder if it is possible to do that without having to remove and install again OpenCV-3.0 ? 回答1: I resolved the problem by running: sudo apt

OpenCV: Failed to load OpenCL runtime

寵の児 提交于 2019-12-24 17:30:34
问题 I am running a program in which I got the error on the title of my question. I have found an answer here that suggests to download OpenCV from github, then compile with ENABLE_OPENCL=OFF (using CMake) and use the built libs against the application. Or may be is there a way to set this flag in the program itself without modifying anything in OpenCV ? I wonder if it is possible to do that without having to remove and install again OpenCV-3.0 ? 回答1: I resolved the problem by running: sudo apt

Matrix multiplication on GPU. Memory bank conflicts and latency hiding

大憨熊 提交于 2019-12-24 14:25:16
问题 Edit: achievements over time is listed at the end of this question(~1Tflops/s yet). Im writing some kind of math library for C# using opencl(gpu) from C++ DLL and already done some optimizations on single precision square matrix-matrix multiplicatrion(for learning purposes and possibility of re-usage in a neural-network program later). Below kernel code gets v1 1D array as rows of matrix1(1024x1024) and v2 1D array as columns of matrix2((1024x1024)transpose optimization) and puts the result

unroll loops in an AMD OpenCL kernel

断了今生、忘了曾经 提交于 2019-12-24 14:18:31
问题 I'm trying to assess the performance differences between OpenCL for AMD .I have kernel for hough transfrom in the kernel i have two #pragma unroll statements when run the kernel not produce any speedup kernel void hough_circle(read_only image2d_t imageIn, global int* in,const int w_hough,__global int * circle) { sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; int gid0 = get_global_id(0); int gid1 = get_global_id(1); uint4 pixel; int x0=0,y0=0,r;

OpenCL error: undefined reference to `_Z12atom_cmpxchgPVU8CLglobalmmm()'

ぃ、小莉子 提交于 2019-12-24 14:05:39
问题 When compiling the following OpenCL kernel: #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable __kernel void kernel(__global ulong* mem) { atom_cmpxchg(&mem[0], 0, 1); } I get the following error: error: undefined reference to `_Z12atom_cmpxchgPVU8CLglobalmmm()' I'm using OpenCL from Rust with the OCL library. My OpenCL version is 1.2, my GPU is an Intel(R) Iris(TM) Graphics 550, I'm under macOS Sierra 10.12.1. 回答1: Check the CL_DEVICE_EXTENSIONS of your device with clGetDeviceInfo()

OpenCL Video Processing

邮差的信 提交于 2019-12-24 13:15:21
问题 I'm about to write a stacking software. Therefore I want to extract the frames of one or more videofiles to an opencl buffer and then process them with an opencl kernel . But I don't know how to load the video frames as I never worked with videos. As I use opencl my main focus is obviously high performance ! I know there are libraries like ffmpeg or opencv and more, but as I'm not into it I don't know which fits my needs best. So can you give me advice which library/function to use which

OpenCL Limit on for loop size?

邮差的信 提交于 2019-12-24 13:10:06
问题 UPDATE: clEnqueueReadBuffer(command_queue, c_mem_obj, CL_TRUE, 0, LIST_SIZE * sizeof(double), C, 0, NULL, NULL); is returning -5, CL_OUT_OF_RESOURCES . This funciton/call should never return this! I've started using OpenCL and have come across a problem. If I allow a for loop (in the kernel) to run 10000 times I get all of C to be 0 if I allow the loop to run for 8000 the results are all correct. I have added waits around the kernel to ensure it completes, thinking I was pulling the data out

A weird Timinig issue with “clEnqueueNDRangeKernel” in OpenCL

本秂侑毒 提交于 2019-12-24 11:03:01
问题 I'm new to opencl and I'm experiencing a weird issue with it! I have a reduction kernel and I repeat it several times! The problem is that when I profile the execution of kernel the elapsed time (queued->end) is almost same and a bit increasing but when I measure the elasped time within "C++" code the time for the execution of line "clEnqueueNDRangeKernel" increases with a rapid rate!! I have attached both the code and the output of profiling! :shock: // execute the kernel globalWorkSize[0] =

How to accumulate vectors in OpenCL?

时光毁灭记忆、已成空白 提交于 2019-12-24 09:57:49
问题 I have a set of operations running in a loop. for(int i = 0; i < row; i++) { sum += arr1[0] - arr2[0] sum += arr1[0] - arr2[0] sum += arr1[0] - arr2[0] sum += arr1[0] - arr2[0] arr1 += offset1; arr2 += offset2; } Now I'm trying to vectorize the operations like this for(int i = 0; i < row; i++) { convert_int4(vload4(0, arr1) - vload4(0, arr2)); arr1 += offset1; arr2 += offset2; } But how do I accumulate the resulting vector in the scalar sum without using a loop? I'm using OpenCL 2.0. 回答1: The