opencl

How to use hadoop MapReuce framework for an Opencl application?

我的梦境 提交于 2019-12-04 06:07:28
问题 I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce framework and I have divided my basic algorithm into two parts i.e. 'Map' and 'Reduce'. I have never worked in hadoop before so I have some questions: Do I have write my application in java only to use Hadoop and Mapeduce framework? I have written kernel

OpenCL local memory size and number of compute units

独自空忆成欢 提交于 2019-12-04 05:24:38
Each GPU device (AMD, NVidea, or any other) is split into several Compute Units (MultiProcessors), each of which has a fixed number of cores (VertexShaders/StreamProcessors). So, one has (Compute Units) x (VertexShaders/compute unit) simultaneous processors to compute with, but there is only a small fixed amount of __local memory (usually 16KB or 32KB) available per MultiProcessor. Hence, the exact number of these multiprocessors matters. Now my questions: (a) How can I know the number of multiprocessors on a device? Is this the same as CL_DEVICE_MAX_COMPUTE_UNITS ? Can I deduce it from

is clGetKernelWorkGroupInfo - CL_KERNEL_WORK_GROUP_SIZE the size OpenCL uses when not specifying it in clEnqueueNDRange Kernel?

一个人想着一个人 提交于 2019-12-04 05:03:52
I read that when not specifying the work group size when enqueueing a kernel, OpenCL chooses one for me. e.g: //don't know which workgroup size OpenCl will use! clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, NULL, 0, NULL, NULL); Is there a way to get the workgroup size OpenCL is using here? Is the workgroup size OpenCL chooses the one which is returned by clGetKernelWorkGroupInfo? Thank you in advance! CL_KERNEL_GLOBAL_WORK_SIZE is the MAXIMUM work-group size you can get, which depends on the memory requirements of your kernel. If you do not specify a work-group size when

A nice starter kit for OpenCL? [closed]

人盡茶涼 提交于 2019-12-04 05:01:27
Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I've got some experience with OpenGL and it's programmable pipeline. I'd like to give OpenCL a try, though. Could somebody propose a nice integrated kit for working with OpenCL ? I know only of QuartzComposer which looks nice, but it's mac-only. Anyone knows if it supports hand-editing of OpenCL kernels or is it all only through the GUI? Any other Linux / Windows alternative? Quartz Composer does

When to use the OpenCL API scalar data types?

你。 提交于 2019-12-04 03:45:45
问题 I have been having trouble understanding when to use the OpenCL API data types like cl_float, cl_uchar, etc., which can be found here: http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/scalarDataTypes.html The examples I have seen that involve copying a buffer to the device look like this: float data[DATA_SIZE]; // original data set given to device //Create the input and output arrays in device memory for our calculation input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float)

CL_INVALID_WORK_GROUP_SIZE error

帅比萌擦擦* 提交于 2019-12-04 03:36:01
I have this code, for which I already posted something some time ago. Today I got my kernel running with a typedef struct in a little test program, but clEnqueueNDRangeKernel gives an invalid work group size error. This can have 3 causes, according to the khronos webiste. Global work size is not divisable by the local work size. In my code, it is divisable. Local work size is bigger than the GPU can handle. My local worksize is 128, way under the reported maximum of 1024. Something to do with local work size that is NULL . My local work size isn't NULL , it's 128. I've searched the internet

Is available OpenCL on iOS

核能气质少年 提交于 2019-12-03 23:32:03
问题 I found this thread on the forum Are either the IPad or IPhone capable of OpenCL? but is it quite old. Also, what I can gather that OpenCL is available to system libraries of iOS but not to public. Is there more info in this regard or any update ? 回答1: Even with using OpenCL as private framework, on iOS it won't give you the benefits of GPU ( or others like DSPs/FPGAs if existing ). It just gives you multiple cores available on arm processor. I ran the below code to verify the OpenCL devices

what's the correct and most efficient way to use mapped(zero-copy) memory mechanism in Nvidia OpenCL environment?

∥☆過路亽.° 提交于 2019-12-03 22:21:22
Nvidia has offered an example about how to profile bandwidth between Host and Device, you can find codes here: https://developer.nvidia.com/opencl (search "bandwidth"). The experiment is carried on in an Ubuntu 12.04 64-bits computer. I am inspecting pinned memory and mapped accessing mode, which can be tested by invoke: ./bandwidthtest --memory=pinned --access=mapped The core test loop on Host-to-Device bandwidth is at around line 736~748. I also list them here and add some comments and context code: //create a buffer cmPinnedData in host cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM

Neutral element for min() and max() in OpenCL reduction

梦想的初衷 提交于 2019-12-03 21:21:51
I'm doing a reduction (finding the minimum and maximum) of a float[] array on a GPU through OpenCL. I'm loading the some elements from global memory into local memory for each workgroup. When the global size isn't a multiple of the workgroup size, I pad the global size, such that it becomes a multiple of the global size. Work-items past the end of the array put the neutral element of the reduction into local memory. But what should that neutral element be for max() -- the maximum function? The OpenCL documentation gives MAXFLOAT , HUGE_VALF and INFINITY as very large positive (or unsigned)

Rotating hundreds of JPEGs in seconds rather than hours

試著忘記壹切 提交于 2019-12-03 20:14:17
We have hundreds of images which our computer gets at a time and we need to rotate and resize them as fast as possible. Rotation is done by 90, 180 or 270 degrees. Currently we are using the command line tool GraphicsMagick to rotate the image. Rotating the images (5760*3840 ~ 22MP) takes around 4 to 7 seconds. The following python code sadly gives us equal results import cv img = cv.LoadImage("image.jpg") timg = cv.CreateImage((img.height,img.width), img.depth, img.channels) # transposed image # rotate counter-clockwise cv.Transpose(img,timg) cv.Flip(timg,timg,flipMode=0) cv.SaveImage(