gpu-shared-memory

Using both dynamically-allocated and statically-allocated shared memory

こ雲淡風輕ζ 提交于 2019-12-19 10:51:39
问题 Suppose I have two __device__ CUDA function, each having the following local variable: __shared__ int a[123]; and another function (say it's my kernel, i.e. a __global__ function), with: extern __shared__ int b[]; Is this explicitly allowed/forbidden by nVIDIA? (I don't see it in the programming guide section B.2.3 on __shared__ ) Do the sizes all count together together towards the shared memory limit, or is it the maximum possibly in use at a single time? Or some other rule? This can be

Getting CUDA error “declaration is incompatible with previous ”variable_name"

醉酒当歌 提交于 2019-12-13 08:55:02
问题 I'm trying to compile a program including a kernel with MSVS 2012 and CUDA. I use shared memory, but unlike in this question regarding the same problem, I only use my variable name for this kernel's shared memory once, so there's no issue of redefinition. With code like this: template<typename T> __global__ void mykernel( const T* __restrict__ data, T* __restrict__ results) { extern __shared__ T warp_partial_results[]; /* ... */ warp_partial_results[lane_id] = something; /* ... */ results

Interpreting the verbose output of ptxas, part II

[亡魂溺海] 提交于 2019-12-11 05:24:44
问题 This question is a continuation of Interpreting the verbose output of ptxas, part I . When we compile a kernel .ptx file with ptxas -v , or compile it from a .cu file with -ptxas-options=-v , we get a few lines of output such as: ptxas info : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20' ptxas info : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*) 72 bytes stack frame, 0 bytes spill stores,

Is there a limit to OpenCL local memory?

断了今生、忘了曾经 提交于 2019-11-30 15:06:04
问题 Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of the cl functions returns an error code. I further tried only to add one of the two smaller variables. If I add only one of them, it works, but if I add both of them, it breaks down. So could this behavior of OpenCL mean, that I allocated to much _

Is there a limit to OpenCL local memory?

我是研究僧i 提交于 2019-11-30 12:53:45
Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of the cl functions returns an error code. I further tried only to add one of the two smaller variables. If I add only one of them, it works, but if I add both of them, it breaks down. So could this behavior of OpenCL mean, that I allocated to much __local memory? How do I find out, how much __local memory is usable by me? Kyle Lutz The amount of

GPU shared memory size is very small - what can I do about it?

╄→尐↘猪︶ㄣ 提交于 2019-11-29 03:08:22
The size of the shared memory ("local memory" in OpenCL terms) is only 16 KiB on most nVIDIA GPUs of today. I have an application in which I need to create an array that has 10,000 integers. so the amount of memory I will need to fit 10,000 integers = 10,000 * 4b = 40kb. How can I work around this? Is there any GPU that has more than 16 KiB of shared memory ? Think of shared memory as explicitly managed cache. You will need to store your array in global memory and cache parts of it in shared memory as needed, either by making multiple passes or some other scheme which minimises the number of

allocating shared memory

半腔热情 提交于 2019-11-26 17:25:03
i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this: __global__ void Kernel(const int count) { __shared__ int a[count]; } and i am getting an error saying error: expression must have a constant value count is const! Why am I getting this error? And how can I get around this? const doesn't mean "constant", it means "read-only". A constant expression is something whose value is known to the compiler at compile-time. CUDA supports dynamic shared memory allocation. If you define the kernel like this: __global__ void Kernel(const int

allocating shared memory

和自甴很熟 提交于 2019-11-26 05:25:15
问题 i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this: __global__ void Kernel(const int count) { __shared__ int a[count]; } and i am getting an error saying error: expression must have a constant value count is const! Why am I getting this error? And how can I get around this? 回答1: const doesn't mean "constant", it means "read-only". A constant expression is something whose value is known to the compiler at compile-time. 回答2: CUDA