opencl | 易学教程

Fast way to swap endianness using opencl

阅读更多关于 Fast way to swap endianness using opencl

问题 I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device. Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step. Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel? Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect

How to run Apple's OpenCL “Hello World” example in XCode

阅读更多关于 How to run Apple's OpenCL “Hello World” example in XCode

问题 Apple provides an OpenCL "Hello World" example, which can be downloaded as a .zip file from the following page: https://developer.apple.com/library/mac/samplecode/OpenCL_Hello_World_Example/Introduction/Intro.html I downloaded it, opened the project in Xcode, and clicked Run . The build succeeded, but I got the following error message: Error: Failed to create a device group! I would appreciate any advice on how to get a simple OpenCL example running on my Mac. In case it is diagnostically

clEnqueueMarkerWithWaitList usage

阅读更多关于 clEnqueueMarkerWithWaitList usage

问题 I recently read a book about OpenCL and queue synchronizing methods, but I didn't understand difference between using clEnqueueMarkerWithWaitList and clWaitforEvents. For example, in the below example, The kernel_2 instance's execution is dependent on writing of two buffers clmem_A and clmem_B to the device. I don't understand what is the difference when we delete the clEnqueueMarkerWithWaitList command and change the argument of clwaitforEvents to write_event. cl_event write_event[2];

/usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available

阅读更多关于 /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available

问题 When I am running computecpp_info $ /usr/local/computecpp/bin/computecpp_info /usr/local/computecpp/bin/computecpp_info: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/local/computecpp/bin/computecpp_info) /usr/local/computecpp/bin/computecpp_info: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/local/computecpp/bin/computecpp_info) ***********************************************************************

clinfo device cpu-gpu info [closed]

阅读更多关于 clinfo device cpu-gpu info [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . any one can tell me why Max work items for my gpu less than cpu and compute unit ??? is that mean performance for cpu is better than gpu cpu : intel core i7 2.2GH gpu : amd radeon hd 6700M Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.2 AMD-APP (1084.2) Platform Name: AMD

String operations on OpenCL

阅读更多关于 String operations on OpenCL

问题 EDIT I did some tests on the char array inputs to the kernel. I noticed a rather odd behaviour: consider the Kernel program and accompanying PyOpenCL code: #!/usr/bin/env python3 import pyopencl as cl import numpy as np import seq # Write down our kernel as a multiline string. kernel = """ __kernel void dragon( const int N, __global char *AplusB, __global char *AminusB, __global char *plusMinus, __global char *minusMinus, __global char *output ) { int idx = get_global_id(0); if (idx < N){

Large for loop crashing in GeForce Nvidia GT 610

阅读更多关于 Large for loop crashing in GeForce Nvidia GT 610

问题 I have an OpenCL kernel with two nested loops. It works fine up to a certain number of iterations, but crashes when the number of iterations is increased. The loop essentially does not create any new data (i.e., there is no global memory overflow etc.), it just iterates more number of time. What can I do to allow more iterations? Has anyone encountered this problem? Thanks a lot 回答1: Are you running this on Windows? Windows has a watchdog timer mechanism that restarts the display driver if it

OpenCL kernel cannot work as expected (pyopencl)

阅读更多关于 OpenCL kernel cannot work as expected (pyopencl)

问题 I wrote an OpenCL function to increase 64-bits float point value in an array. But the results is different between CPU and GPU. import numpy as np import pyopencl as cl CL_INC = ''' __kernel void inc_f64(__global const double *a_g, __global double *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + 1.0; } ''' def test(dev_type): ctx = cl.Context(dev_type=dev_type) queue = cl.CommandQueue(ctx) mf = cl.mem_flags prg = cl.Program(ctx, CL_INC).build() in_py = np.array([1.0, 2.0, 3.0, 4

OpenCL load program from binary

阅读更多关于 OpenCL load program from binary

问题 I have the following very simple kernel in OpenCL void kernel simple_add(global const int* A, global const int* B, global int* C){ C[get_global_id(0)]=A[get_global_id(0)]+B[get_global_id(0)]; }; I created a C++ program to load the kernel from a binary created from its source. The binary loads correctly (CL_SUCCESS), but does not display the correct result for the input. It displays changing garbage values like so result: 538976310 538976288 538976288 538976288 538976288 790634528 796160111

How do I develop CUDA application on my ATI, to be later executed on NVIDIA

阅读更多关于 How do I develop CUDA application on my ATI, to be later executed on NVIDIA

问题 My computer has an ATI graphics card, but I need to code an algorithm I already have in CUDA, to accerelate the process. Is that even possible? If yes does anyone have any link or tutorial from setting up my IDE to coding a simple image processing or passing an image. I also considered OpenCL but I have not found any information how to do anything with it. 回答1: This answer is more directed toward the part I also considered OpenCL but I have not found any information how to do anything with it