gpu-constant-memory

CUDA constant memory symbols

☆樱花仙子☆ 提交于 2020-01-05 05:39:07
问题 I am using CUDA 5.0 and I have modules which are compiled separately. I would like to access the same value in the constant memory from all modules. The problem is the following, when I define the symbol in each module the linker claims that the symbol has been redefined. Is there a workaround or a solution for this problem? Thank you for helping. 回答1: In CUDA separate compilation mode, there is a true linker, and every symbol which is linked into the final device binary payload much be

When should texture memory be prefered over constant memory?

南楼画角 提交于 2019-12-19 04:50:33
问题 Does the use of data storage in constant memory provides any benefit over texture in the Pascal architecture if the data request frequency is very high among threads (every thread pick at least one data from a specific column)? EDIT: This is a split version of this question to improve community searching 回答1: If the expectations for constant memory usage are satisfied, the use of constant memory is a good idea in the general case. It is allowing your code to take advantage of an additional

When should texture memory be prefered over constant memory?

一世执手 提交于 2019-12-12 04:57:32
问题 Does the use of data storage in constant memory provides any benefit over texture in the Pascal architecture if the data request frequency is very high among threads (every thread pick at least one data from a specific column)? EDIT: This is a split version of this question to improve community searching 回答1: If the expectations for constant memory usage are satisfied, the use of constant memory is a good idea in the general case. It is allowing your code to take advantage of an additional

CUDA Constant Memory Error

隐身守侯 提交于 2019-12-11 10:55:39
问题 I am trying to do a sample code with constant memory with CUDA 5.5. I have 2 constant arrays of size 3000 each. I have another global array X of size N. I want to compute Y[tid] = X[tid]*A[tid%3000] + B[tid%3000] Here is the code. #include <iostream> #include <stdio.h> using namespace std; #include <cuda.h> __device__ __constant__ int A[3000]; __device__ __constant__ int B[3000]; __global__ void kernel( int *dc_A, int *dc_B, int *X, int *out, int N) { int tid = threadIdx.x + blockIdx.x

Interpreting the verbose output of ptxas, part II

[亡魂溺海] 提交于 2019-12-11 05:24:44
问题 This question is a continuation of Interpreting the verbose output of ptxas, part I . When we compile a kernel .ptx file with ptxas -v , or compile it from a .cu file with -ptxas-options=-v , we get a few lines of output such as: ptxas info : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20' ptxas info : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*) 72 bytes stack frame, 0 bytes spill stores,

CUDA constant memory banks

心已入冬 提交于 2019-12-07 01:21:16
问题 When we check the register usage by using xptxas we see something like this: ptxas info : Used 63 registers, 244 bytes cmem[0], 51220 bytes cmem[2], 24 bytes cmem[14], 20 bytes cmem[16] I wonder if currently there is any documentation that clearly explains cmem[x]. What is the point of separating constant memory into multiple banks, how many banks are there in total, and what are other banks other than 0, 2, 14, 16 used for? as a side note, @njuffa (special thanks to you) previously explained

CUDA constant memory banks

依然范特西╮ 提交于 2019-12-05 04:32:56
When we check the register usage by using xptxas we see something like this: ptxas info : Used 63 registers, 244 bytes cmem[0], 51220 bytes cmem[2], 24 bytes cmem[14], 20 bytes cmem[16] I wonder if currently there is any documentation that clearly explains cmem[x]. What is the point of separating constant memory into multiple banks, how many banks are there in total, and what are other banks other than 0, 2, 14, 16 used for? as a side note, @njuffa (special thanks to you) previously explained on nvidia's forum what is bank 0,2,14,16: Used constant memory is partitioned in constant program

using constant memory prints address instead of value in cuda

谁都会走 提交于 2019-12-03 00:49:47
问题 I am trying to use the constant memory in the code with constant memory assigned value from kernel not using cudacopytosymbol. #include <iostream> using namespace std; #define N 10 //__constant__ int constBuf_d[N]; __constant__ int *constBuf; __global__ void foo( int *results ) { int tdx = threadIdx.x; int idx = blockIdx.x * blockDim.x + tdx; if( idx < N ) { constBuf[idx]=1; results[idx] = constBuf[idx]; } } // main routine that executes on the host int main(int argc, char* argv[]) { int

cudaMemcpyToSymbol performance

笑着哭i 提交于 2019-12-02 08:49:18
问题 I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code: __constant__ double res1; __global__kernel1(...) {...} void function1() { double resHost = 255 / ((double) size); CUDA_CHECK

CUDA writing to constant memory wrong value

会有一股神秘感。 提交于 2019-12-02 07:04:56
问题 I have the following code to copy from a host variable to a __constant__ variable in CUDA int main(int argc, char **argv){ int exit_code; if (argc < 4) { std::cout << "Usage: \n " << argv[0] << " <input> <output> <nColors>" << std::endl; return 1; } Color *h_input; int h_rows, h_cols; timer1.Start(); exit_code = readText2RGB(argv[1], &h_input, &h_rows, &h_cols); timer1.Stop(); std::cout << "Reading: " << timer1.Elapsed() << std::endl; if (exit_code != SUCCESS){ std::cout << "Error trying to