gpgpu

Improving random memory access when random access is needed

南楼画角 提交于 2019-12-21 20:44:27
问题 The basic concept of what I am doing Complete coalition structure formation problem/Combinatorial auctions. Given a set of N agents, which disjoint subsets of the set of agents yields the best outcome. E.g. Agents = {a,b} and their values {a} = 2 {b} = 3 {a,b} = 4 In this instance the coalition of {{a},{b}} = 5 would give the best outcome, where it is the pairwise disjoint subset of {a,b} . So in short the problem is about splitting a set and check if any of the splittings sum is greater than

How to write a fragment shader in GLSL to sort an array of 9 floating point numbers

风格不统一 提交于 2019-12-21 20:19:17
问题 I am writing a fragment shader in order to median 9 images together. I have never worked with GLSL before, but it seemed like the right tool for the job, as OpenCL isn't available on iOS and medianing on the CPU is inefficient. Here's what I have so far: uniform sampler2D frames[9]; uniform vec2 wh; void main(void) { vec4 sortedFrameValues[9]; float sortedGrayScaleValues[9]; for (int i = 0; i < 9; i++) { sortedFrameValues[i] = texture2D(frames[i], -gl_FragCoord.xy / wh); sortedGrayScaleValues

How to generate, compile and run CUDA kernels at runtime

百般思念 提交于 2019-12-21 14:19:32
问题 Well, I have quite a delicate question :) Let's start with what I have: Data , large array of data, copied to GPU Program , generated by CPU (host), which needs to be evaluated for every data in that array The program changes very frequently, can be generated as CUDA string, PTX string or something else (?) and needs to be re-evaluated after each change What I want: Basically just want to make this as effective (fast) as possible, eg. avoid compilation of CUDA to PTX. Solution can be even

Parameters to CUDA kernels

≡放荡痞女 提交于 2019-12-21 09:28:53
问题 When invoking a CUDA kernel for a specific thread configuration, are there any strict rules on which memory space (device/host) kernel parameters should reside in and what type they should be? Suppose I launch a 1-D grid of threads with kernel<<<numblocks, threadsperblock >>> (/*parameters*/) Can I pass an integer parameter int foo which is a host -integer variable, directly to the CUDA kernel? Or should I cudaMalloc memory for a single integer say dev_foo and then cudaMemcpy foo into devfoo

How to check for GPU on CentOS Linux

陌路散爱 提交于 2019-12-21 07:55:16
问题 It is suggested that on Linux, GPU be found with the command lspci | grep VGA . It works fine on Ubuntu but when I try to use the same on CentOS, it says lspci command is not found. How can I check for the GPU card on CentOS. And note that I'm not the administrator of the machine and I only use it remotely from command line. I intend to use the GPU as a GPGPU on that machine, but first I need to check if it even has one. 回答1: Have you tried to launch /sbin/lspci or /usr/sbin/lspci ? 回答2: This

Create local array dynamic inside OpenCL kernel

好久不见. 提交于 2019-12-21 04:59:11
问题 I have a OpenCL kernel that needs to process a array as multiple arrays where each sub-array sum is saved in a local cache array. For example, imagine the fowling array: [[1, 2, 3, 4], [10, 30, 1, 23]] Each work-group gets a array (in the exemple we have 2 work-groups); Each work-item process two array indexes (for example multiply the value index the local_id), where the work-item result is saved in a work-group shared array. __kernel void test(__global int **values, __global int *result,

How many threads (or work-item) can run at the same time?

末鹿安然 提交于 2019-12-21 03:21:08
问题 I'm new in GPGPU programming and I'm working with NVIDIA implementation of OpenCL. My question was how to compute the limit of a GPU device (in number of threads). From what I understood a there are a number of work-group (equivalent of blocks in CUDA) that contain a number of work-item (~ cuda thread). How do I get the number of work-group present on my card (and that can run at the same time) and the number of work-item present on one work group? To what CL_DEVICE_MAX_COMPUTE_UNITS

How many threads (or work-item) can run at the same time?

喜夏-厌秋 提交于 2019-12-21 03:21:07
问题 I'm new in GPGPU programming and I'm working with NVIDIA implementation of OpenCL. My question was how to compute the limit of a GPU device (in number of threads). From what I understood a there are a number of work-group (equivalent of blocks in CUDA) that contain a number of work-item (~ cuda thread). How do I get the number of work-group present on my card (and that can run at the same time) and the number of work-item present on one work group? To what CL_DEVICE_MAX_COMPUTE_UNITS

Using multiple GPUs OpenCL

自古美人都是妖i 提交于 2019-12-21 03:03:48
问题 I have a loop within which I am launching multiple kernels onto a GPU. Below is the snippet: for (int idx = start; idx <= end ;idx ++) { ret = clEnqueueNDRangeKernel(command_queue, memset_kernel, 1, NULL, &global_item_size_memset, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching 1st memset_kernel !"); ret = clEnqueueNDRangeKernel(command_queue, cholesky_kernel, 1, NULL, &global_item_size_cholesky, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching

Using multiple GPUs OpenCL

限于喜欢 提交于 2019-12-21 03:03:46
问题 I have a loop within which I am launching multiple kernels onto a GPU. Below is the snippet: for (int idx = start; idx <= end ;idx ++) { ret = clEnqueueNDRangeKernel(command_queue, memset_kernel, 1, NULL, &global_item_size_memset, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching 1st memset_kernel !"); ret = clEnqueueNDRangeKernel(command_queue, cholesky_kernel, 1, NULL, &global_item_size_cholesky, &local_item_size, 0, NULL, NULL); ASSERT_CL(ret, "Error after launching