Determining the least element and its position in each matrix column with CUDA Thrust

前端 未结 3 1089
清歌不尽
清歌不尽 2020-12-17 02:13

I have a fairly simple problem but I cannot figure out an elegant solution to it.

I have a Thrust code which produces c vectors of same size containing

3条回答
  •  盖世英雄少女心
    2020-12-17 03:04

    Since the length of your vectors has to be the same. It's better to concatenate them together and treat them as a matrix C.

    Then your problem becomes finding the indices of the min element of each column in a row-major matrix. It can be solved as follows.

    1. change the row-major to col-major;
    2. find indices for each column.

    In step 1, you proposed to use stable_sort_by_key to rearrange the element order, which is not a effective method. Since the rearrangement can be directly calculated given the #row and #col of the matrix. In thrust, it can be done with permutation iterators as:

    thrust::make_permutation_iterator(
        c.begin(),
        thrust::make_transform_iterator(
            thrust::make_counting_iterator((int) 0),
            (_1 % row) * col + _1 / row)
    )
    

    In step 2, reduce_by_key can do exactly what you want. In your case the reduction binary-op functor is easy, since comparison on tuple (element of your zipped vector) has already been defined to compare the 1st element of the tuple, and it's supported by thrust as

    thrust::minimum< thrust::tuple >()
    

    The whole program is shown as follows. Thrust 1.6.0+ is required since I use placeholders in fancy iterators.

    #include 
    #include 
    
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    using namespace thrust::placeholders;
    
    int main()
    {
    
        const int row = 2;
        const int col = 5;
        float initc[] =
                { 0, 10, 20, 3, 40, 1, 2, 3, 5, 10 };
        thrust::device_vector c(initc, initc + row * col);
    
        thrust::device_vector minval(col);
        thrust::device_vector minidx(col);
    
        thrust::reduce_by_key(
                thrust::make_transform_iterator(
                        thrust::make_counting_iterator((int) 0),
                        _1 / row),
                thrust::make_transform_iterator(
                        thrust::make_counting_iterator((int) 0),
                        _1 / row) + row * col,
                thrust::make_zip_iterator(
                        thrust::make_tuple(
                                thrust::make_permutation_iterator(
                                        c.begin(),
                                        thrust::make_transform_iterator(
                                                thrust::make_counting_iterator((int) 0), (_1 % row) * col + _1 / row)),
                                thrust::make_transform_iterator(
                                        thrust::make_counting_iterator((int) 0), _1 % row))),
                thrust::make_discard_iterator(),
                thrust::make_zip_iterator(
                        thrust::make_tuple(
                                minval.begin(),
                                minidx.begin())),
                thrust::equal_to(),
                thrust::minimum >()
        );
    
        std::copy(minidx.begin(), minidx.end(), std::ostream_iterator(std::cout, " "));
        std::cout << std::endl;
        return 0;
    }
    

    Two remaining issues may affect the performance.

    1. min values have to be outputted, which is not required;
    2. reduce_by_key is designed for segments with variant lengths, it may not be the fastest algorithm for reduction on segments with same length.

    Writing your own kernel could be the best solution for highest performance.

提交回复
热议问题