opencl | 易学教程

How to use hadoop MapReuce framework for an Opencl application?

阅读更多关于 How to use hadoop MapReuce framework for an Opencl application?

问题 I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce framework and I have divided my basic algorithm into two parts i.e. 'Map' and 'Reduce'. I have never worked in hadoop before so I have some questions: Do I have write my application in java only to use Hadoop and Mapeduce framework? I have written kernel

OpenCL local memory size and number of compute units

阅读更多关于 OpenCL local memory size and number of compute units

Each GPU device (AMD, NVidea, or any other) is split into several Compute Units (MultiProcessors), each of which has a fixed number of cores (VertexShaders/StreamProcessors). So, one has (Compute Units) x (VertexShaders/compute unit) simultaneous processors to compute with, but there is only a small fixed amount of __local memory (usually 16KB or 32KB) available per MultiProcessor. Hence, the exact number of these multiprocessors matters. Now my questions: (a) How can I know the number of multiprocessors on a device? Is this the same as CL_DEVICE_MAX_COMPUTE_UNITS ? Can I deduce it from

is clGetKernelWorkGroupInfo - CL_KERNEL_WORK_GROUP_SIZE the size OpenCL uses when not specifying it in clEnqueueNDRange Kernel?

阅读更多关于 is clGetKernelWorkGroupInfo - CL_KERNEL_WORK_GROUP_SIZE the size OpenCL uses when not specifying it in clEnqueueNDRange Kernel?

I read that when not specifying the work group size when enqueueing a kernel, OpenCL chooses one for me. e.g: //don't know which workgroup size OpenCl will use! clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL); Is there a way to get the workgroup size OpenCL is using here? Is the workgroup size OpenCL chooses the one which is returned by clGetKernelWorkGroupInfo? Thank you in advance! CL_KERNEL_GLOBAL_WORK_SIZE is the MAXIMUM work-group size you can get, which depends on the memory requirements of your kernel. If you do not specify a work-group size when

A nice starter kit for OpenCL? [closed]

阅读更多关于 A nice starter kit for OpenCL? [closed]

Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I've got some experience with OpenGL and it's programmable pipeline. I'd like to give OpenCL a try, though. Could somebody propose a nice integrated kit for working with OpenCL ? I know only of QuartzComposer which looks nice, but it's mac-only. Anyone knows if it supports hand-editing of OpenCL kernels or is it all only through the GUI? Any other Linux / Windows alternative? Quartz Composer does

When to use the OpenCL API scalar data types?

阅读更多关于 When to use the OpenCL API scalar data types?

问题 I have been having trouble understanding when to use the OpenCL API data types like cl_float, cl_uchar, etc., which can be found here: http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.html The examples I have seen that involve copying a buffer to the device look like this: float data[DATA_SIZE]; // original data set given to device //Create the input and output arrays in device memory for our calculation input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float)

CL_INVALID_WORK_GROUP_SIZE error

阅读更多关于 CL_INVALID_WORK_GROUP_SIZE error

I have this code, for which I already posted something some time ago. Today I got my kernel running with a typedef struct in a little test program, but clEnqueueNDRangeKernel gives an invalid work group size error. This can have 3 causes, according to the khronos webiste. Global work size is not divisable by the local work size. In my code, it is divisable. Local work size is bigger than the GPU can handle. My local worksize is 128, way under the reported maximum of 1024. Something to do with local work size that is NULL . My local work size isn't NULL , it's 128. I've searched the internet

Is available OpenCL on iOS

阅读更多关于 Is available OpenCL on iOS

问题 I found this thread on the forum Are either the IPad or IPhone capable of OpenCL? but is it quite old. Also, what I can gather that OpenCL is available to system libraries of iOS but not to public. Is there more info in this regard or any update ? 回答1: Even with using OpenCL as private framework, on iOS it won't give you the benefits of GPU ( or others like DSPs/FPGAs if existing ). It just gives you multiple cores available on arm processor. I ran the below code to verify the OpenCL devices

what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

阅读更多关于 what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

Nvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: https://developer.nvidia.com/opencl (search "bandwidth"). The experiment is carried on in an Ubuntu 12.04 64-bits computer. I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ./bandwidthtest --memory=pinned --access=mapped The core test loop on Host-to-Device bandwidth is at around line 736~748. I also list them here and add some comments and context code: //create a buffer cmPinnedData in host cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM

Neutral element for min() and max() in OpenCL reduction

阅读更多关于 Neutral element for min() and max() in OpenCL reduction

I'm doing a reduction (finding the minimum and maximum) of a float[] array on a GPU through OpenCL. I'm loading the some elements from global memory into local memory for each workgroup. When the global size isn't a multiple of the workgroup size, I pad the global size, such that it becomes a multiple of the global size. Work-items past the end of the array put the neutral element of the reduction into local memory. But what should that neutral element be for max() -- the maximum function? The OpenCL documentation gives MAXFLOAT , HUGE_VALF and INFINITY as very large positive (or unsigned)

Rotating hundreds of JPEGs in seconds rather than hours

阅读更多关于 Rotating hundreds of JPEGs in seconds rather than hours

We have hundreds of images which our computer gets at a time and we need to rotate and resize them as fast as possible. Rotation is done by 90, 180 or 270 degrees. Currently we are using the command line tool GraphicsMagick to rotate the image. Rotating the images (5760*3840 ~ 22MP) takes around 4 to 7 seconds. The following python code sadly gives us equal results import cv img = cv.LoadImage("image.jpg") timg = cv.CreateImage((img.height,img.width), img.depth, img.channels) # transposed image # rotate counter-clockwise cv.Transpose(img,timg) cv.Flip(timg,timg,flipMode=0) cv.SaveImage(