opencl

OpenCL buffer allocation and mapping best practice

萝らか妹 提交于 2020-01-16 07:41:13
问题 I am a little confused as to whether my code using OpenCL mapped buffers are correct. I have two examples, one using CL_MEM_USE_HOST_PTR and one using CL_MEM_ALLOC_HOST_PTR. Both work and run on my local machine and OpenCL devices but I am interested in whether this is the correct way of doing the mapping, and whether it should work an all OpenCL devices. I am especially unsure about the USE_HOST_PTR example. I am only interested in the buffer/map specific operations. I am aware I should do

OpenCL buffer allocation and mapping best practice

别来无恙 提交于 2020-01-16 07:41:12
问题 I am a little confused as to whether my code using OpenCL mapped buffers are correct. I have two examples, one using CL_MEM_USE_HOST_PTR and one using CL_MEM_ALLOC_HOST_PTR. Both work and run on my local machine and OpenCL devices but I am interested in whether this is the correct way of doing the mapping, and whether it should work an all OpenCL devices. I am especially unsure about the USE_HOST_PTR example. I am only interested in the buffer/map specific operations. I am aware I should do

Determine host CPU from list of devices obtained in OpenCL?

久未见 提交于 2020-01-15 12:23:49
问题 I am calling clGetDeviceIDs by passing an array of cl_device_id and getting all possible devices. Now from this list I want to remove the device which is actually the host CPU? Is there any fool proof way to do this? Because there might be 2 exactly identical CPU installed then cl_device_info might not be helpful in differentiating the 2 CPU? 回答1: In OpenCL 1.1 and later, you can check if the device and the host have a unified memory subsystem by using CL_DEVICE_HOST_UNIFIED_MEMORY with

Determine host CPU from list of devices obtained in OpenCL?

岁酱吖の 提交于 2020-01-15 12:23:29
问题 I am calling clGetDeviceIDs by passing an array of cl_device_id and getting all possible devices. Now from this list I want to remove the device which is actually the host CPU? Is there any fool proof way to do this? Because there might be 2 exactly identical CPU installed then cl_device_info might not be helpful in differentiating the 2 CPU? 回答1: In OpenCL 1.1 and later, you can check if the device and the host have a unified memory subsystem by using CL_DEVICE_HOST_UNIFIED_MEMORY with

AMD CPU versus Intel CPU openCL

纵饮孤独 提交于 2020-01-14 19:14:41
问题 With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both. So here is our questions: Is AMD better than Intel with openCL? Is it a matter to have a Nvidia

Memory allocation Nvidia vs AMD

淺唱寂寞╮ 提交于 2020-01-13 20:36:10
问题 I know there is a 128MB limit for a single block of GPU memory on AMD GPU's. Is there a similar limit on Nvidia GPU's? 回答1: On GTX 560 clGetDeviceInfo returns 256MiB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, however I can allocate slightly less than 1GiB. See this thread discussing the issue. On AMD however this limit is enforced. You can raise it by changing GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables (see this thread). 回答2: You can query this information at runtime using

Best GPU algorithm for calculating lists of neighbours

不羁的心 提交于 2020-01-13 12:16:28
问题 Given a collection of thousands of points in 3D, I need to get the list of neighbours for each particle that fall inside some cutoff value (in terms of euclidean distance), and if possible, sorted from nearest fo farthest. Which is the fastest GPU algorithm for this purpose in the CUDA or OpenCL languages? 回答1: One of the fastest GPU MD codes I'm aware of, HALMD, uses a (highly tuned) version of the same sort of approach that is used in the CUDA SDK examples, "Particles". Both the HALMD paper

Is it necessary to enqueue read/write when using CL_MEM_USE_HOST_PTR?

爱⌒轻易说出口 提交于 2020-01-13 06:43:13
问题 Assume that I am wait() ing for the kernel to compute the work. I was wondering if, when allocating a buffer using the CL_MEM_USE_HOST_PTR flag, it is necessary to use enqueueRead/Write on the buffer, or they can always be omitted? Note I am aware of this note on the reference: Calling clEnqueueReadBuffer to read a region of the buffer object with the ptr argument value set to host_ptr + offset, where host_ptr is a pointer to the memory region specified when the buffer object being read is

need to convert C++ template to C99 code

两盒软妹~` 提交于 2020-01-13 05:19:05
问题 I am porting CUDA code to OpenCL - CUDA allows C++ constructs like templates while OpenCL is strictly C99. So, what is the most painless way of porting templatest to C? I thought of using function pointers for the template parameters. 回答1: Before there were templates, there were preprocessor macros. Search the web for "generic programming in C" for inspiration. 回答2: Here is the technique I used for conversion of some of CUDA algorithms from Modern GPU code to my GPGPU VexCL library (with

How to install aparapi

こ雲淡風輕ζ 提交于 2020-01-13 03:49:11
问题 I have been looking to a way to develop openCL in Java. I found aparapi interesting as it focusses on parallelization but creates openCL code as well. As I understand it the code will run with or without a GPU but still run parallized. My trouble is: where can I find documentation on how to install what? The AMD site was often pointed at, but it contains no information about aparapi, I wondered as well whether their code will work on Nvidia cards. The links to Google code is obsolete and the