std::num_put issue with nan-boxing due to auto-cast from float to double

问题

I'm using this post to extend nan values with some extra info and this post to modify std::cout behaviour and display this extra info.

Here is the code defining the functions and NumPut class:

#include <iostream>
#include <assert.h>
#include <limits>
#include <bitset>
#include <cmath>
#include <locale>
#include <ostream>
#include <sstream>

template <typename T>
void showValue( T val, const std::string& what )
{
    union uT {
      T d;
      unsigned long long u;
    };
    uT ud;
    ud.d = val;
    std::bitset<sizeof(T) * 8> b(ud.u);
    std::cout << val << " (" << what << "): " << b.to_string() << std::endl;
}

template <typename T>
T customizeNaN( T value, char mask )
{
    T res = value;
    char* ptr = (char*) &res;
    assert( ptr[0] == 0 );
    ptr[0] |= mask;
    return res;
}

template <typename T>
bool isCustomNaN( T value, char mask )
{
    char* ptr = (char*) &value;
    return ptr[0] == mask;
}

template <typename T>
char getCustomNaNMask( T value )
{
    char* ptr = (char*) &value;
    return ptr[0];
}

template <typename Iterator = std::ostreambuf_iterator<char> >
class NumPut : public std::num_put<char, Iterator>
{
private:
    using base_type = std::num_put<char, Iterator>;

public:
    using char_type = typename base_type::char_type;
    using iter_type = typename base_type::iter_type;

    NumPut(std::size_t refs = 0)
    :   base_type(refs)
    {}

protected:
    virtual iter_type do_put(iter_type out, std::ios_base& str, char_type fill, double v) const override {
        if(std::isnan(v))
        {
            char mask = getCustomNaNMask(v);
            if ( mask == 0x00 )
            {
                out = std::copy(std::begin(NotANumber), std::end(NotANumber), out);
            }
            else
            {
                std::stringstream maskStr;
                maskStr << "(0x" << std::hex << (unsigned) mask << ")";
                std::string temp = maskStr.str();
                out = std::copy(std::begin(CustomNotANumber), std::end(CustomNotANumber), out);
                out = std::copy(std::begin(temp), std::end(temp), out);
            }
        }
        else
        {
            out = base_type::do_put(out, str, fill, v);
        }
        return out;
    }

private:
    static const std::string NotANumber;
    static const std::string CustomNotANumber;
};

template<typename Iterator> const std::string NumPut<Iterator>::NotANumber = "Not a Number";
template<typename Iterator> const std::string NumPut<Iterator>::CustomNotANumber = "Custom Not a Number";

inline void fixNaNToStream( std::ostream& str )
{
    str.imbue( std::locale(str.getloc(), new NumPut<std::ostreambuf_iterator<char>>() ) );
}

A simple test function:

template<typename T>
void doTest()
{
    T regular_nan = std::numeric_limits<T>::quiet_NaN();
    T myNaN1 = customizeNaN( regular_nan, 0x01 );
    T myNaN2 = customizeNaN( regular_nan, 0x02 );

    showValue( regular_nan, "regular" );
    showValue( myNaN1, "custom 1" );
    showValue( myNaN2, "custom 2" );
}

My main program:

int main(int argc, char *argv[])
{
    fixNaNToStream( std::cout );

    doTest<double>();
    doTest<float>();

    return 0;
}

doTest<double> outputs:

Not a Number (regular): 0111111111111000000000000000000000000000000000000000000000000000
Custom Not a Number(0x1) (custom 1): 0111111111111000000000000000000000000000000000000000000000000001
Custom Not a Number(0x2) (custom 2): 0111111111111000000000000000000000000000000000000000000000000010

doTest<float> outputs:

Not a Number (regular): 01111111110000000000000000000000
Not a Number (custom 1): 01111111110000000000000000000001
Not a Number (custom 2): 01111111110000000000000000000010

While I would expect for float:

Not a Number (regular): 01111111110000000000000000000000
Custom Not a Number(0x1) (custom 1): 01111111110000000000000000000001
Custom Not a Number(0x2) (custom 2): 01111111110000000000000000000010

The problem is that num_put only has a virtual do_put for double, not for float. So my float is silently casted to a double, losing my extended information.

I know there are some alternatives, like using FloatFormat from the second post, or simply writing a smart float2double function and calling it prior to sending my NaN value to the output stream, but they require the developer to take care of this situation...and he may forget to.

Is there no way to implement that within NumPut class or anything else that would simply make things work when a float is send to the imbued stream as nicely as it works for a double?

My requirement is to be able to simply call a function like fixNaNToStream for any output stream (std::cout, local std::stringstream, ...) and then send float and double to it and get them identified as my custom NaNs and displayed accordingly.

回答1:

The problem is that num_put only has a virtual do_put for double, not for float. So my float is silently casted to a double, losing my extended information.

The information is lost because the positions of the bits carrying it are different when the number is converted from float to double:

// Assuming an IEE-754 floating-point representation of float and double
0 11111111 10000000000000000000010
0 11111111111 1000000000000000000001000000000000000000000000000000

Note that the mantissa bits are "shifted" by 3 positions, because the exponent requires 3 more bits.

Also, it's worth noting what it's stated in this page: https://en.cppreference.com/w/cpp/numeric/math/isnan

Copying a NaN is not required, by IEEE-754, to preserve its bit representation (sign and payload), though most implementation do.

I assume the same holds for casting such values, so that, even ignoring other causes of undefined behavior in OP's code, whether a method of NaN-boxing could work or not is actually implementation defined.

In my former attempts of answering this question, I used some explicit bit shifting by different offset to achive the result, but as jpo38 also found out, the easiest way is to always generate a float NaN and then cast correctly.

The Standard Library function std::nanf could be used to generate a "customized" float NaN, but in the following demo snippet I won't use it.

#include <cstdint>
#include <limits>
#include <cstring>
#include <cassert>
#include <type_traits>
#include <iostream>
#include <bitset>
#include <array>
#include <climits>

namespace my {

// Waiting for C++20 std::bit_cast
// source: https://en.cppreference.com/w/cpp/numeric/bit_cast
template <class To, class From>
typename std::enable_if<
    (sizeof(To) == sizeof(From)) &&
    std::is_trivially_copyable<From>::value &&
    std::is_trivial<To>::value,
    // this implementation requires that To is trivially default constructible
    To>::type
// constexpr support needs compiler magic
bit_cast(const From &src) noexcept
{
    To dst;
    std::memcpy(&dst, &src, sizeof(To));
    return dst;
}

template <typename T, std::size_t Size = sizeof(T)>
void print_bits(T x)
{
    std::array<unsigned char, Size> buf;
    std::memcpy(buf.data(), &x, Size);
    for (auto it = buf.crbegin(); it != buf.crend(); ++it)
    {
        std::bitset<CHAR_BIT> b{*it};
        std::cout << b.to_string();
    }
    std::cout << '\n';
}

// The following assumes that both floats and doubles store the mantissa
// in the lower bits and that while casting a NaN (float->double or double->float)
// the most significant of those aren't changed
template <typename T>
auto boxed_nan(uint8_t data = 0) -> typename std::enable_if<std::numeric_limits<T>::has_quiet_NaN, T>::type
{
    return bit_cast<float>(
        bit_cast<uint32_t>(std::numeric_limits<float>::quiet_NaN()) |
        static_cast<uint32_t>(data)
    );
}

template <typename T>
uint8_t unbox_nan(T num)
{
    return bit_cast<uint32_t>(static_cast<float>(num));
}

}; // End of namespace 'my'


int main()
{
    auto my_nan = my::boxed_nan<float>(42);
    my::print_bits(my_nan);
    my::print_bits(static_cast<double>(my_nan));
    assert(my::unbox_nan(my_nan) == 42);
    assert(my::unbox_nan(static_cast<double>(my_nan)) == 42);

    auto my_d_nan = my::boxed_nan<double>(17);
    my::print_bits(my_d_nan);
    my::print_bits(static_cast<float>(my_d_nan));
    assert(my::unbox_nan(my_d_nan) == 17);
    assert(my::unbox_nan(static_cast<float>(my_d_nan)) == 17);

    auto my_ld_nan = my::boxed_nan<long double>(9);
    assert(my::unbox_nan(my_ld_nan) == 9);
    assert(my::unbox_nan(static_cast<double>(my_ld_nan)) == 9);
}

回答2:

As Bob pointed, the double extended bit should be at the same relative position to biased exponent than it is for float if you want cast to work in both ways (from float to double and from double to float).

Considering that, a very trivial approach to handle that is to use the far right bit for the float. For for double, instead of trying to determine manually what bit should be used, simply douse cast operations and let the system identify where is the right place...

Then code becomes:

#include <iostream>
#include <assert.h>
#include <limits>
#include <bitset>
#include <cmath>
#include <locale>
#include <ostream>
#include <sstream>

template <typename T>
void showValue( T val, const std::string& what )
{
    union uT {
      T d;
      unsigned long long u;
    };
    uT ud;
    ud.d = val;
    std::bitset<sizeof(T) * 8> b(ud.u);
    std::cout << val << " (" << what << "): " << b.to_string() << std::endl;
}

char& getCustomNaNMask( float& value )
{
    char* ptr = (char*) &value;
    return ptr[0];
}

/** temp parameter is mainly used because we can't have two functions with same prototype even if they return different values */
float getCustomizedNaN( char mask, float temp )
{
    // let's reuse temp argument as we need a local float variable
    temp = std::numeric_limits<float>::quiet_NaN();
    getCustomNaNMask(temp) |= mask;
    return temp;
}

/** temp parameter is mainly used because we can't have two functions with same prototype even if they return different values */
double getCustomizedNaN( char mask, double temp )
{
    float asFloat = getCustomizedNaN( mask, float() );
    // Let the system correctly cast from float to double, that's it!
    return static_cast<double>( asFloat );
}

template <typename T>
bool isCustomNaN( T value, char mask )
{
    return getCustomNaNMask(value) == mask;
}

template <typename Iterator = std::ostreambuf_iterator<char> >
class NumPut : public std::num_put<char, Iterator>
{
private:
    using base_type = std::num_put<char, Iterator>;

public:
    using char_type = typename base_type::char_type;
    using iter_type = typename base_type::iter_type;

    NumPut(std::size_t refs = 0)
    :   base_type(refs)
    {}

protected:
    virtual iter_type do_put(iter_type out, std::ios_base& str, char_type fill, double v) const override {
        if(std::isnan(v))
        {
            float asFloat = static_cast<float>( v );
            char& mask = getCustomNaNMask(asFloat);
            if ( mask == 0x00 )
            {
                out = std::copy(std::begin(NotANumber), std::end(NotANumber), out);
            }
            else
            {
                std::stringstream maskStr;
                maskStr << "(0x" << std::hex << (unsigned) mask << ")";
                std::string temp = maskStr.str();
                out = std::copy(std::begin(CustomNotANumber), std::end(CustomNotANumber), out);
                out = std::copy(std::begin(temp), std::end(temp), out);
            }
        }
        else
        {
            out = base_type::do_put(out, str, fill, v);
        }
        return out;
    }

private:
    static const std::string NotANumber;
    static const std::string CustomNotANumber;
};

template<typename Iterator> const std::string NumPut<Iterator>::NotANumber = "Not a Number";
template<typename Iterator> const std::string NumPut<Iterator>::CustomNotANumber = "Custom Not a Number";

inline void fixNaNToStream( std::ostream& str )
{
    str.imbue( std::locale(str.getloc(), new NumPut<std::ostreambuf_iterator<char>>() ) );
}

And test program:

template<typename T>
void doTest()
{
    T regular_nan = std::numeric_limits<T>::quiet_NaN();
    T myNaN1 = getCustomizedNaN( 0x01, T() );
    T myNaN2 = getCustomizedNaN( 0x02, T() );

    showValue( regular_nan, "regular" );
    showValue( myNaN1, "custom 1" );
    showValue( myNaN2, "custom 2" );
}

int main(int argc, char *argv[])
{
    fixNaNToStream( std::cout );

    doTest<double>();
    doTest<float>();

    return 0;
}

Outputs:

Not a Number (regular): 0111111111111000000000000000000000000000000000000000000000000000
Custom Not a Number(0x1) (custom 1): 0111111111111000000000000000000000100000000000000000000000000000
Custom Not a Number(0x2) (custom 2): 0111111111111000000000000000000001000000000000000000000000000000
Not a Number (regular): 01111111110000000000000000000000
Custom Not a Number(0x1) (custom 1): 01111111110000000000000000000001
Custom Not a Number(0x2) (custom 2): 01111111110000000000000000000010

Thanks Bob!

来源：https://stackoverflow.com/questions/53713992/stdnum-put-issue-with-nan-boxing-due-to-auto-cast-from-float-to-double

标签

c++

c++11

nan

ostream