how fast is thrust::sort and what is the fastest radix sort implementation

牧云@^-^@ 提交于 2019-12-21 22:11:18

问题


I'm a newbie to GPU programming. Recently, I'm trying to implement the gpu bvh construction algorithm based on an tutorial: http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/. In the first step of this algorithm, the morton code(unsigned int) of every primitive is computed and sorted. The tutorial gives a reference time cost of the morton code computing and sorting for 12K objects:

  1. 0.02 ms, one thread per object: Calculate bounding box and assign Morton code.
  2. 0.18 ms, parallel radix sort: Sort the objects according to their Morton codes. ...

In my implementation, the first step cost 0.1ms and the sorting step costs 1.8ms. I'm using thrust to do the sorting. So what is the fastest implementation of radix sort on GPU?

I'm using a Geforce Titan GPU which should faster than the Geforce GTX690 used by the author of the tutorial. Here is my test code for sorting, it costs about 1.5ms even when the size is 10.

void testSort()
{
    int sz = 10;
    thrust::host_vector<unsigned int> h_keys(sz);
    for(int i=0; i<sz; i++)
        h_keys[i] = rand();
    thrust::device_ptr<unsigned int> keys = thrust::device_malloc<unsigned int>(sz);
    thrust::copy(h_keys.begin(),h_keys.end(),keys);
    cudaEvent_t estart, estop;
    cudaEventCreate( &estart );
    cudaEventCreate( &estop );
    cudaEventRecord( estart, 0 );
    thrust::stable_sort(keys,keys+sz);
    cudaEventRecord( estop, 0 ) ;
    cudaEventSynchronize( estop );
    float elapsedTime;
    cudaEventElapsedTime( &elapsedTime,
        estart, estop ) ;
    printf( "Time to sort: %3.1f ms\n", elapsedTime );
    cudaEventDestroy( estart ) ;
    cudaEventDestroy( estop ) ;
}

回答1:


There is a Radix sort implementation for GPGPU by back40computing. They provide a performance comparison chart with which they claim that their implementation is the fastest.



来源:https://stackoverflow.com/questions/20854838/how-fast-is-thrustsort-and-what-is-the-fastest-radix-sort-implementation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!