opencl | 易学教程

OpenCL bytecode running on another card

阅读更多关于 OpenCL bytecode running on another card

问题 I have program that use OpenCL for calculation, OpenCL code is big and compile time is about 2 minutes with 100% CPU load. Of course i save binary results of compilation. And second launch load opencl program from binary. Can i use same binary on another video card with same chip but different characteristics (RAM,CLOCK,etc.)? 回答1: As far as the OpenCL specification is concerned, you only have guarantees that a program binary can be re-used on the same device on which it was created. In

Many OpenCL SDK's. Which of them i should choose?

阅读更多关于 Many OpenCL SDK's. Which of them i should choose?

问题 In my computer with Windows 7 OS I have three versions of OpenCL SDKS's from this vendors: Intel NVIDIA AMD. I build my application with each of them. As the output I have three different binaries. For example: my_app_intel_x86, my_app_amd_x86, my_app_nvidia_x86 This binaries are different on this: They use different SDK's in likange process They try to find different OpenCL platform name in runtime Can I use only one SDK and check platform on running time? 回答1: SDK's give debuggings tools, a

Copying bytes without memcpy

阅读更多关于 Copying bytes without memcpy

问题 I have several variables of different types stored in a char array. Normally I would write them to the array this way: int a = 5; memcpy(offset, (char*)&a, sizeof(int)) However, memcpy doesn't work in OpenCL kernels. What would be the easiest way to do the same without this function? 回答1: You can easily enough provide mymemcpy void mymemcpy(unsigned char *dest, const unsigned char *src, size_t N) { size_t i; for(i=0;i<N;i++) dest[i] = src[i]; } However it's not very efficient because most

How to remove CL_INVALID_PLATFORM error in opencl code?

阅读更多关于 How to remove CL_INVALID_PLATFORM error in opencl code?

问题 Doing simple matrix multiplication using OpenCL: // Multiply two matrices A * B = C #include <stdlib.h> #include <stdio.h> #include <math.h> #include <oclUtils.h> #define WA 3 #define HA 3 #define WB 3 #define HB 3 #define WC 3 #define HC 3 // Allocates a matrix with random float entries. void randomInit(float* data, int size) { for (int i = 0; i < size; ++i) data[i] = rand() / (float)RAND_MAX; } ///////////////////////////////////////////////////////// // Program main ///////////////////////

Performing many small matrix operations in parallel in OpenCL

阅读更多关于 Performing many small matrix operations in parallel in OpenCL

问题 I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable. Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU

并行计算之基础概念

阅读更多关于并行计算之基础概念

　　并行计算（Parallel Computing）是指同时使用多种计算资源解决计算问题的过程，是提高计算机系统计算速度和处理能力的一种有效手段。它的基本思想是用多个处理器来协同求解同一问题，即将被求解的问题分解成若干个部分，各部分均由一个独立的处理机来并行计算。并行计算系统既可以是专门设计的、含有多个处理器的超级计算机，也可以是以某种方式互连的若干台的独立计算机构成的集群。通过并行计算集群完成数据的处理，再将处理的结果返回给用户。　　　　并行计算或称平行计算是相对于串行计算来说的。所谓并行计算可分为时间上的并行和空间上的并行。时间上的并行就是指流水线技术，而空间上的并行则是指用多个处理器并发的执行计算。　　并行计算科学中主要研究的是空间上的并行问题。从程序和算法设计人员的角度来看，并行计算又可分为数据并行和任务并行。空间上的并行导致了两类并行机的产生，按照Flynn的说法分为：单指令流多数据流（SIMD）和多指令流多数据流（MIMD）。我们常用的串行机也叫做单指令流单数据流（SISD）。　　MIMD类的机器又可分为以下常见的五类：并行向量处理机（PVP）、对称多处理机（SMP）、大规模并行处理机（MPP）、工作站机群（COW）、分布式共享存储处理机（DSM）。　　目前常见的并行编程技术包括：MPI、OPENMP、OPENCL、OPENGL、CUDA

Determine limiting factor of OpenCL workgroup size?

阅读更多关于 Determine limiting factor of OpenCL workgroup size?

问题 I am trying to run some OpenCL kernels written for desktop graphics cards on an embedded GPU with less resources. In particular, the desktop version assumes a work group size of at least 256 is always supported, but the Mali T628 ARM-based GPU only guarantees 64+ work group size. Indeed, some kernels report CL_KERNEL_WORK_GROUP_SIZE of only 64, and I can't figure out why. I checked the CL_KERNEL_LOCAL_MEM_SIZE for the kernels in question and it is <2 KiB, whereas the CL_DEVICE_LOCAL_MEM_SIZE

OpenCl cleanup causes segfault

阅读更多关于 OpenCl cleanup causes segfault

问题 I constructed my own little Opencl example using different sources on the net. The actual kernel works, and I get the output I want, but the cleanup functions, I found in one of the examples, cause segfaults. What did I do wrong? #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <CL/cl.h> //opencl #define CL_CHECK(_expr) \ do { \ cl_int _err = _expr; \ if (_err == CL_SUCCESS) \ break; \ fprintf(stderr, "OpenCL Error: '%s' returned %d!\n", #_expr, (int)_err); \ abort(); \ }

How to verify wavefront/warp size in OpenCL?

阅读更多关于 How to verify wavefront/warp size in OpenCL?

问题 I am using AMD Radeon HD 7700 GPU. I want to use the following kernel to verify the wavefront size is 64. __kernel void kernel__test_warpsize( __global T* dataSet, uint size ) { size_t idx = get_global_id(0); T value = dataSet[idx]; if (idx<size-1) dataSet[idx+1] = value; } In the main program, I pass an array with 128 elements. The initial values are dataSet[i]=i. After the kernel, I expect the following values: dataSet[0]=0 dataSet[1]=0 dataSet[2]=1 ... dataSet[63]=62 dataSet[64]=63 dataSet

enqueueWriteImage fail on GPU

阅读更多关于 enqueueWriteImage fail on GPU

问题 I am developing some kernels which works with image buffers. The problem is that when I create my Image2D by directly copying the data of the image, everything works well. If I try to enqueue a write to my image buffer, it won't works for my GPU. Here is a basic kernel : __kernel void myKernel(__read_only image2d_t in, __write_only image2d_t out) { const int x = get_global_id(0); const int y = get_global_id(1); const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_CLAMP_TO_EDGE | CLK