Bound CUDA texture reads zero

只谈情不闲聊 提交于 2019-12-22 06:40:53

问题


I try to read values from a texture and write them back to global memory. I am sure the writing part works, beause I can put constant values in the kernel and I can see them in the output:

__global__ void
bartureKernel( float* g_odata, int width, int height) 
{
    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

    if(x < width && y < height) {
            unsigned int idx = (y*width + x);
            g_odata[idx] = tex2D(texGrad, (float)x, (float)y).x;

    }
}

The texture I want to use is a 2D float texture with two channels, so I defined it as:

texture<float2, 2, cudaReadModeElementType> texGrad;

And the code which calls the kernel initializes the texture with some constant non-zero values:

float* d_data_grad = NULL;

cudaMalloc((void**) &d_data_grad, gradientSize * sizeof(float));
CHECK_CUDA_ERROR;

texGrad.addressMode[0] = cudaAddressModeClamp;
texGrad.addressMode[1] = cudaAddressModeClamp;
texGrad.filterMode = cudaFilterModeLinear;
texGrad.normalized = false;

cudaMemset(d_data_grad, 50, gradientSize * sizeof(float));
CHECK_CUDA_ERROR;

cudaBindTexture(NULL, texGrad, d_data_grad, cudaCreateChannelDesc<float2>(), gradientSize * sizeof(float));

float* d_data_barture = NULL;
cudaMalloc((void**) &d_data_barture, outputSize * sizeof(float));
CHECK_CUDA_ERROR;

dim3 dimBlock(8, 8, 1);
dim3 dimGrid( ((width-1) / dimBlock.x)+1, ((height-1) / dimBlock.y)+1, 1);

bartureKernel<<< dimGrid, dimBlock, 0 >>>( d_data_barture, width, height);

I know, setting the texture bytes to all "50" doesn't make much sense in the context of floats, but it should at least give me some non-zero values to read.

I can only read zeros though...


回答1:


You are using cudaBindTexture to bind your texture to the memory allocated by cudaMalloc. In the kernel you are using tex2D function to read values from the texture. That is why it is reading zeros.

If you bind texture to linear memory using cudaBindTexture, it is read using tex1Dfetch inside the kernel.

tex2D is used to read only from those textures which are bound to pitch linear memory ( which is allocated by cudaMallocPitch ) using the function cudaBindTexture2D, or those textures which are bound to cudaArray using the function cudaBindTextureToArray

Here is the basic table, rest you can read from the programming guide:

Memory Type----------------- Allocated Using-----------------Bound Using-----------------------Read In The Kernel By

Linear Memory...................cudaMalloc........................cudaBindTexture.............................tex1Dfetch

Pitch Linear Memory.........cudaMallocPitch.............cudaBindTexture2D........................tex2D

cudaArray............................cudaMallocArray.............cudaBindTextureToArray.............tex1D or tex2D

3D cudaArray......................cudaMalloc3DArray........cudaBindTextureToArray.............tex3D




回答2:


To add on, access using tex1Dfetch is based on integer indexing. However, the rest are indexed based on floating point, and you have to add +0.5 to get the exact value you want.

I'm curious why do you create float and bind to a float2 texture? It may gives ambiguous results. float2 is not 2D float texture. It can actually be used for representation of complex number.

typedef struct {float x; float y;} float2;

I think this tutorial will help you understand how to use texture memory in cuda. http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/218100902

The kernel you shown does not benefit much from using texture. however, if utilized properly, by exploiting locality, texture memory can improve the performance by quite a lot. Also, it is useful for interpolation.



来源:https://stackoverflow.com/questions/13119813/bound-cuda-texture-reads-zero

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!