fp16 support in cuda thrust

纵饮孤独 提交于 2019-12-23 03:11:30

问题


I am not able to found anything about the fp16 support in thrust cuda template library. Even the roadmap page has nothing about it: https://github.com/thrust/thrust/wiki/Roadmap

But I assume somebody has probably figured out how to overcome this problem, since the fp16 support in cuda is around for more than 6 month.

As of today, I heavily rely on thrust in my code, and templated nearly every class I use in order to ease fp16 integration, unfortunately, absolutely nothing works out of the box for half type even this simple sample code:

//STL
#include <iostream>
#include <cstdlib>

//Cuda
#include <cuda_runtime_api.h>
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <cuda_fp16.h>
#define T half //work when float is used

int main(int argc, char* argv[])
{
        thrust::device_vector<T> a(10,1.0f);
        float t = thrust::reduce( a.cbegin(),a.cend(),(float)0);
        std::cout<<"test = "<<t<<std::endl;
        return EXIT_SUCCESS;
}

This code cannot compile because it seems that there is no implicit conversion from float to half or half to float. However, it seems that there are intrinsics in cuda that allow for an explicit conversion.

Why can't I simply overload the half and float constructor in some header file in cuda, to add the previous intrinsic like that :

float::float( half a )
{
  return  __half2float( a ) ;
}

half::half( float a )
{
  return  __float2half( a ) ;
}

My question may seem basic but I don't understand why I haven't found much documentation about it.

Thank you in advance


回答1:


The very short answer is that what you are looking for doesn't exist.

The slightly longer answer is that thrust is intended to work on fundamental and POD types only, and the CUDA fp16 half is not a POD type. It might be possible to make two custom classes (one for the host and one for the device) which implements all the required object semantics and arithmetic operators to work correctly with thrust, but it would not be an insignificant effort to do it (and it would require writing or adapting an existing FP16 host library).

Note also that the current FP16 support is only in device code and only on compute 5.3 and newer devices. So unless you have a Tegra TX1, you can't use the FP16 library in device code anyway.



来源:https://stackoverflow.com/questions/36196829/fp16-support-in-cuda-thrust

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!