opencl | 易学教程

Detecting and recovering from Windows TDR?

阅读更多关于 Detecting and recovering from Windows TDR?

问题 I've run into an odd issue with some OpenCL code that I'm working on where every once in a blue moon, Windows TDR will kick in and reset the GPU. The offending kernel runs for only 150ms and will run thousands of times (over the course of many hours) before the TDR kills it off, so I'm certain that the kernel itself isn't to blame. My concern is that once the TDR kicks in, the kernel dies and the program is stuck in an eternal state of limbo. From what I can tell the call to clFinish never

OpenCL on Linux with integrated intel graphic chip

阅读更多关于 OpenCL on Linux with integrated intel graphic chip

问题 I would like to use OpenCL on debian 8. I read on this page that Intel's GPUs are not supported on linux. (The article is from 2011, so I hope it is out of date.) I already installed OpenCL nontheless and can run compile and run the code found here. As to my hardware. My processor is Intel(R) Core(TM) i7-4500 CPU @ 1.80GHz lspci | grep VGA outputs 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) So to clearify: I want to know, if it is

OpenCL:Why Pointer to a pointer cannot be passed as an argument to a kernel function?

阅读更多关于 OpenCL:Why Pointer to a pointer cannot be passed as an argument to a kernel function?

问题 Hi, I just want some clarification on Why we can not pass the 2D array pointer as argument to the kernel . Why it is not allowed . What will happen if I use this as argument (Internally??as I know the code will give some error) . Please do the needful . 回答1: Because in OpenCL 1.x the device has a separate address space . Kernels executing on the device wouldn't know what to do with a pointer that is only useful in host address space. Note that in OpenCL 2.0 Shared Virtual Memory (SVM) removes

OpenCL: One Program running one multiple devices

阅读更多关于 OpenCL: One Program running one multiple devices

问题 I found already this OpenCL: Running CPU/GPU multiple devices. But i've stil questions (3) how to run a programm on multiple devices. Is the recipe as follows?(Q1) create the devices you want to use. For every device create a context. for every context call clBuilProgram to build a program for every program call clCreateCommandQueue to build one command queue per context for every context and for every function parameter call clCreateBuffer. or must i concatenate the CommandQueues.(Q2) Has

Running OpenCL on hardware from mixed vendors

阅读更多关于 Running OpenCL on hardware from mixed vendors

问题 I've been playing with the ATI OpenCL implementation in their Stream 2.0 beta. The OpenCL in the current beta only uses the CPU for now, the next version is supposed to support GPU kernels. I downloaded Stream because I have an ATI GPU in my work machine. I write software that would benefit hugely from gains by using the GPU. However this software runs on customer machines, I don't have the luxury (as many scientific computing environments have) to choose the exact hardware to develop for,

Using structure as buffer holder

阅读更多关于 Using structure as buffer holder

问题 In my current OpenCL implementation, I wanted to save time with arguments, avoid to pass them every time I wanted to use a buffer inside a kernel and have a shorter argument list for my kernel. So I made a structure (workspace) that holds the pointer to the buffer in device memory, the struct act like an object with member variable you want to access through time and you want to stay alive for the whole execution. I never had a problem on AMD GPU or even on CPU. But Nvidia causing a lot of

Timed interval always evaluates to zero

阅读更多关于 Timed interval always evaluates to zero

问题 The code on the host is like this: #include<time.h> clock_t start,finish; start=clock(); ret = clEnqueueNDRangeKernel(.........); finish=clock(); double time = (double)(finish-start)/(double)(CLOCK_PER_SEC); Why is finish - start always 0? Is it because of low resolution, or is there something wrong with my timer code? 回答1: Enqueue-ing a kernel is very cheap, since the function call can return before the kernel is executed. You could use the event generated by the clEnqueueNDRangeKernel to

LLVM front end register class error OpenCL — GPU target

阅读更多关于 LLVM front end register class error OpenCL — GPU target

问题 I've recently been encountering this error when compiling OpenCL kernel files with my LLVM_IR pass: aoc: ../../../TargetRegisterInfo.cpp:89: const llvm::TargetRegisterClass* llvm::TargetRegisterInfo::getMinimalPhysRegClass(unsigned int, llvm::EVT) const: Assertion `BestRC && "Couldn't find the register class"' failed. I'm not sure what this means. What I've read from the documention doesn't make a lot of sense. Basically it means the backend doesn't know what type to place into the register?

OpenCL create subdevices CL_DEVICE_PARTITION_FAILED

阅读更多关于 OpenCL create subdevices CL_DEVICE_PARTITION_FAILED

问题 I'm stuck at getting clCreateSubDevices working, where CL_DEVICE_PARTITION_FAILED is always returned and I have no clue to solve this problem. I'm trying to create a subdevice with one core only. Here is the code, do you see anything wrong with it? Thanks! Here are the function signatures: clCreateSubDevices, clGetPlatformIDs, clGetDeviceIDs cl_platform_id platform_id = NULL; cl_device_id device_id = NULL; cl_uint ret_num_devices; cl_uint ret_num_platforms; cl_int ret = clGetPlatformIDs(1,

NDRange Number of work-items

阅读更多关于 NDRange Number of work-items

问题 I'm trying to copy an image using OpenCL: std::string kernelCode = "void kernel copy(global const int* image, global int* result)" "{" "result[get_global_id(0)] = image[get_global_id(0)];" "}"; The image contains 200 * 300 pixels. The maximum number of work-items is 4100 according to CL_DEVICE_MAX_WORK_GROUP_SIZE In the queue: int size = _originalImage.width() * _originalImage.height(); //... queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(size), cl::NullRange); Gives