CUDA Add Rows of a Matrix

纵然是瞬间 提交于 2019-12-06 13:36:30

Instead of looping over each column, parallelize on the columns. Each of 4600 threads sums the 9600 entries in its column, and puts the sum in the appropriate place in the result vector.

If you're looking for a library to make working with Cuda simpler, I highly recommend Thrust: http://code.google.com/p/thrust/

Using Thrust, I would create a functor to hold your matrix's pointer in device memory, and then map it over a sequence of column indices. The operator() of the functor would take an index, sum up everything in that column of the matrix, and return the sum. Then you would have your sum sitting in a thrust::device_vector without any memory copies (or even direct CUDA calls).

Your functor might look something like:

struct ColumnSumFunctor {
    const Matrix matrix;

    // Make a functor to sum the matrix
    ColumnSumFunctor(const Matrix& matrix);

    // Compute and return the sum of the specified column
    __device__
    int operator()(const int& column) const;
};

Reduction is very basic operation in GPGPU, it's supposed to be fast, and 9600 times of reduction shouldn't be slow either.

What graphics card are you using?

I suggest you split it into 9600 arrays, each time you reduce an array of 4800 elements into one result. Instead of reduceTotal, I suggest you use CUDPP to perform the reduction operation, CUDPP is like the STL for CUDA. It's implemented with concern on performance.

http://code.google.com/p/cudpp/

I think your problem is that you are launching 9600X2 kernels. This should be an easy algorithm to express as a single kernel.

The most naive way to implement it would not coalesce memory, but it could well be faster than the way you are doing it now.

Once you've got the naive way working, then coalesce your memory reads: e.g. have every thread in a block read 16 consecutive floats into shared memory, syncthreads, then accumulate the relevant 16 floats into a register, synthreads, then repeat

The Computing SDK has lots of examples of reduction techniques.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!