Calculate squared Euclidean distance matrix on GPU

隐身守侯 提交于 2019-12-04 13:35:18

问题


Let p be a matrix of first set of locations where each row gives the coordinates of a particular point. Similarly, let q be a matrix of second set of locations where each row gives the coordinates of a particular point.

Then formula for pairwise squared Euclidean distance is:

k(i,j) = (p(i,:) - q(j,:))*(p(i,:) - q(j,:))', 

where p(i,:) denotes i-th row of matrix p, and p' denotes the transpose of p.

I would like to compute matrix k on CUDA-enabled GPU (NVidia Tesla) in C++. I have OpenCV v.2.4.1 with GPU support but I'm open to other alternatives, like Thrust library. However, I'm not too familiar with GPU programming. Can you suggest an efficient way to accomplish this task? What C++ libraries should I use?


回答1:


The problem looks simple enough to make a library overkill.

Without knowing the range of i and j, I'd suggest you partition k into blocks of a multiple of 32 threads each and in each block, compute

float sum, myp[d];
int i = blockIdx.x*blockDim.x + threadIdx.x;
for ( int kk = 0 ; kk < d ; kk++ )
    myp[kk] = p(i,kk);
for ( j = blockIdx.y*blockDim.y ; j < (blockIdx.y+1)*blockDim ; j++ ) {
    sum = 0.0f;
    #pragma unroll
    for ( int kk = 0 ; kk < d ; kk++ ) {
        temp = myp[kk] - q(j,kk);
        sum += temp*temp;
        }
    k(i,j) = sum;
    }

where I am assuming that your data has d dimensions and writing p(i,k), q(j,k) and k(i,j) to mean an access to a two-dimensional array. I also took the liberty in assuming that your data is of type float.

Note that depending on how k is stored, e.g. row-major or column-major, you may want to loop over i per thread instead to get coalesced writes to k.



来源:https://stackoverflow.com/questions/11196748/calculate-squared-euclidean-distance-matrix-on-gpu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!