问题
I try to read values from a texture and write them back to global memory. I am sure the writing part works, beause I can put constant values in the kernel and I can see them in the output:
__global__ void
bartureKernel( float* g_odata, int width, int height)
{
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
if(x < width && y < height) {
unsigned int idx = (y*width + x);
g_odata[idx] = tex2D(texGrad, (float)x, (float)y).x;
}
}
The texture I want to use is a 2D float texture with two channels, so I defined it as:
texture<float2, 2, cudaReadModeElementType> texGrad;
And the code which calls the kernel initializes the texture with some constant non-zero values:
float* d_data_grad = NULL;
cudaMalloc((void**) &d_data_grad, gradientSize * sizeof(float));
CHECK_CUDA_ERROR;
texGrad.addressMode[0] = cudaAddressModeClamp;
texGrad.addressMode[1] = cudaAddressModeClamp;
texGrad.filterMode = cudaFilterModeLinear;
texGrad.normalized = false;
cudaMemset(d_data_grad, 50, gradientSize * sizeof(float));
CHECK_CUDA_ERROR;
cudaBindTexture(NULL, texGrad, d_data_grad, cudaCreateChannelDesc<float2>(), gradientSize * sizeof(float));
float* d_data_barture = NULL;
cudaMalloc((void**) &d_data_barture, outputSize * sizeof(float));
CHECK_CUDA_ERROR;
dim3 dimBlock(8, 8, 1);
dim3 dimGrid( ((width-1) / dimBlock.x)+1, ((height-1) / dimBlock.y)+1, 1);
bartureKernel<<< dimGrid, dimBlock, 0 >>>( d_data_barture, width, height);
I know, setting the texture bytes to all "50" doesn't make much sense in the context of floats, but it should at least give me some non-zero values to read.
I can only read zeros though...
回答1:
You are using cudaBindTexture
to bind your texture to the memory allocated by cudaMalloc
. In the kernel you are using tex2D
function to read values from the texture. That is why it is reading zeros.
If you bind texture to linear memory using cudaBindTexture
, it is read using tex1Dfetch
inside the kernel.
tex2D
is used to read only from those textures which are bound to pitch linear memory ( which is allocated by cudaMallocPitch
) using the function cudaBindTexture2D
, or those textures which are bound to cudaArray using the function cudaBindTextureToArray
Here is the basic table, rest you can read from the programming guide:
Memory Type----------------- Allocated Using-----------------Bound Using-----------------------Read In The Kernel By
Linear Memory...................cudaMalloc
........................cudaBindTexture
.............................tex1Dfetch
Pitch Linear Memory.........cudaMallocPitch
.............cudaBindTexture2D
........................tex2D
cudaArray............................cudaMallocArray
.............cudaBindTextureToArray
.............tex1D
or tex2D
3D cudaArray......................cudaMalloc3DArray
........cudaBindTextureToArray
.............tex3D
回答2:
To add on, access using tex1Dfetch is based on integer indexing. However, the rest are indexed based on floating point, and you have to add +0.5 to get the exact value you want.
I'm curious why do you create float and bind to a float2 texture? It may gives ambiguous results. float2 is not 2D float texture. It can actually be used for representation of complex number.
typedef struct {float x; float y;} float2;
I think this tutorial will help you understand how to use texture memory in cuda. http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/218100902
The kernel you shown does not benefit much from using texture. however, if utilized properly, by exploiting locality, texture memory can improve the performance by quite a lot. Also, it is useful for interpolation.
来源:https://stackoverflow.com/questions/13119813/bound-cuda-texture-reads-zero