opencl | 易学教程

When to use the OpenCL API scalar data types?

阅读更多关于 When to use the OpenCL API scalar data types?

I have been having trouble understanding when to use the OpenCL API data types like cl_float, cl_uchar, etc., which can be found here: http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.html The examples I have seen that involve copying a buffer to the device look like this: float data[DATA_SIZE]; // original data set given to device //Create the input and output arrays in device memory for our calculation input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, // Write our data set into the input array in device memory err = clEnqueueWriteBuffer

Can this OpenCL code be optimized?

阅读更多关于 Can this OpenCL code be optimized?

问题 I am working on a piece of OpencL code for a specialized matrix function: for a Dx1 vector v , two DxD matrices A and B and a constant c , return 1xD vector r where r[i] = c * sum_over_j (v[j] * A[i][j] * B[i][j]) Below is what I have so far, but it runs freakishly slow. A version without summing that returns a DxD matrix is about ten times faster. It's called from PyOpenCL if that makes any difference. Is anything done wrong? Could it be optimized? #define D 1000 ... __kernel void element

Can this OpenCL code be optimized?

阅读更多关于 Can this OpenCL code be optimized?

I am working on a piece of OpencL code for a specialized matrix function: for a Dx1 vector v , two DxD matrices A and B and a constant c , return 1xD vector r where r[i] = c * sum_over_j (v[j] * A[i][j] * B[i][j]) Below is what I have so far, but it runs freakishly slow. A version without summing that returns a DxD matrix is about ten times faster. It's called from PyOpenCL if that makes any difference. Is anything done wrong? Could it be optimized? #define D 1000 ... __kernel void element_mult( __global float *result, __global const float *vector, __global const float *matrix, __global const

Real-time video encoding in DirectShow

阅读更多关于 Real-time video encoding in DirectShow

问题 I have developed a Windows application that captures video from an external device using DirectShow. The image resolution is 640x480 and the videos saved without compression have very huge sizes (approx. 27MB per second). My goal is to reduce this size as much as possible, so I am looking for an encoder which will allow me to compress the video in real-time. It could be H.264, MPEG-2 or anything else. It must allow me to save the video to disk and it would be best if I also could stream it in

Strategy for doing final reduction

阅读更多关于 Strategy for doing final reduction

I am trying to implement an OpenCL version for doing reduction of a array of float. To achieve it, I took the following code snippet found on the web : __kernel void sumGPU ( __global const double *input, __global double *partialSums, __local double *localSums) { uint local_id = get_local_id(0); uint group_size = get_local_size(0); // Copy from global memory to local memory localSums[local_id] = input[get_global_id(0)]; // Loop for computing localSums for (uint stride = group_size/2; stride>0; stride /=2) { // Waiting for each 2x2 addition into given workgroup barrier(CLK_LOCAL_MEM_FENCE); //

OpenCL reduction result wrong with large floats

阅读更多关于 OpenCL reduction result wrong with large floats

I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct. I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue? There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating

How to run build using graphics drivers by using optirun (Bumblebee) from IDE (Netbeans, Eclipse)?

阅读更多关于 How to run build using graphics drivers by using optirun (Bumblebee) from IDE (Netbeans, Eclipse)?

Does anyone know how to make eclipse or netbeans use the graphics card in optimus laptops by invoking optirun (bumblebee) inside the IDE so that one can just use the run button in the IDE to run the program in a graphics card within the IDE. In simplest form I just want the IDE to do the equivalent of optirun ./javaproject The way I did this in Eclipse was to first start the Java debugger jdwp and listen to a port. Then start the JVM with optirun java ... and use jdwp to connect to this port. Both tasks can be started at the same time in Eclipse by creating a Launch Group in the debug

How to pass a list of strings to an opencl kernel using pyopencl?

阅读更多关于 How to pass a list of strings to an opencl kernel using pyopencl?

How to pass list of strings to an opencl kernel the right way? I tried this way using buffers (see following code), but I failed. OpenCL (struct.cl): typedef struct{ uchar uc[40]; } my_struct9; inline void try_this7_now(__global const uchar * IN_DATA , const uint IN_len_DATA , __global uchar * OUT_DATA){ for (unsigned int i=0; i<IN_len_DATA ; i++) OUT_DATA[i] = IN_DATA[i]; } __kernel void try_this7(__global const my_struct9 * pS_IN_DATA , const uint IN_len , __global my_struct9 * pS_OUT){ uint idx = get_global_id(0); for (unsigned int i=0; i<idx; i++) try_this7_now(pS_IN_DATA[i].uc, IN_len, pS

How to run build using graphics drivers by using optirun (Bumblebee) from IDE (Netbeans, Eclipse)?

阅读更多关于 How to run build using graphics drivers by using optirun (Bumblebee) from IDE (Netbeans, Eclipse)?

问题 Does anyone know how to make eclipse or netbeans use the graphics card in optimus laptops by invoking optirun (bumblebee) inside the IDE so that one can just use the run button in the IDE to run the program in a graphics card within the IDE. In simplest form I just want the IDE to do the equivalent of optirun ./javaproject 回答1: The way I did this in Eclipse was to first start the Java debugger jdwp and listen to a port. Then start the JVM with optirun java ... and use jdwp to connect to this

Build OpenCV with OpenCL Enabled and ON

阅读更多关于 Build OpenCV with OpenCL Enabled and ON

I'm trying to run simple code with OpenCL enabled on OpenCV. I've read the intro to OCL documentation and, as instructed, I built opencv with this flag: WITH_OPENCL=ON. I did this by doing cmake -DWITH_OPENCL=ON and then built opencv on Mac (OS X Yosemite). I then tried to run my code, but, according to my code, haveOpenCL() is false. #include <iostream> #include <fstream> #include <string> #include <iterator> #include <opencv2/opencv.hpp> #include <opencv2/core/ocl.hpp> using namespace std; int main () { cv::ocl::setUseOpenCL(true); cout << cv::ocl::haveOpenCL() << endl; if ( ! cv::ocl: