constexpr and endianness

问题

A common question that comes up from time to time in the world of C++ programming is compile-time determination of endianness. Usually this is done with barely portable #ifdefs. But does the C++11 constexpr keyword along with template specialization offer us a better solution to this?

Would it be legal C++11 to do something like:

constexpr bool little_endian()
{
   const static unsigned num = 0xAABBCCDD;
   return reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD;
}

And then specialize a template for both endian types:

template <bool LittleEndian>
struct Foo 
{
  // .... specialization for little endian
};

template <>
struct Foo<false>
{
  // .... specialization for big endian
};

And then do:

Foo<little_endian()>::do_something();

回答1:

Assuming N2116 is the wording that gets incorporated, then your example is ill-formed (notice that there is no concept of "legal/illegal" in C++). The proposed text for [decl.constexpr]/3 says

its function-body shall be a compound-statement of the form { return expression; } where expression is a potential constant expression (5.19);

Your function violates the requirement in that it also declares a local variable.

Edit: This restriction could be overcome by moving num outside of the function. The function still wouldn't be well-formed, then, because expression needs to be a potential constant expression, which is defined as

An expression is a potential constant expression if it is a constant expression when all occurrences of function parameters are replaced by arbitrary constant expressions of the appropriate type.

IOW, reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD would have to be a constant expression. However, it is not: &num would be a address constant-expression (5.19/4). Accessing the value of such a pointer is, however, not allowed for a constant expression:

The subscripting operator [] and the class member access . and operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators.

Edit: The above text is from C++98. Apparently, C++0x is more permissive what it allows for constant expressions. The expression involves an lvalue-to-rvalue conversion of the array reference, which is banned from constant expressions unless

it is applied to an lvalue of effective integral type that refers to a non-volatile const variable or static data member initialized with constant expressions

It's not clear to me whether (&num)[0] "refers to" a const variable, or whether only a literal num "refers to" such a variable. If (&num)[0] refers to that variable, it is then unclear whether reinterpret_cast<const unsigned char*> (&num)[0] still "refers to" num.

回答2:

I was able to write this:

#include <cstdint>

class Endian
{
private:
    static constexpr uint32_t uint32_ = 0x01020304;
    static constexpr uint8_t magic_ = (const uint8_t&)uint32_;
public:
    static constexpr bool little = magic_ == 0x04;
    static constexpr bool middle = magic_ == 0x02;
    static constexpr bool big = magic_ == 0x01;
    static_assert(little || middle || big, "Cannot determine endianness!");
private:
    Endian() = delete;
};

I've tested it with g++ and it compiles without warnings. It gives a correct result on x64. If you have any big-endian or middle-endian proccesor, please, confirm that this works for you in a comment.

回答3:

It is not possible to determine endianness at compile time using constexpr (before C++20). reinterpret_cast is explicitly forbidden by [expr.const]p2, as is iain's suggestion of reading from a non-active member of a union. Casting to a different reference type is also forbidden, as such a cast is interpreted as a reinterpret_cast.

Update:

This is now possible in C++20. One way (live):

#include <bit>
template<std::integral T>
constexpr bool is_little_endian() {
  for (unsigned bit = 0; bit != sizeof(T) * CHAR_BIT; ++bit) {
    unsigned char data[sizeof(T)] = {};
    // In little-endian, bit i of the raw bytes ...
    data[bit / CHAR_BIT] = 1 << (bit % CHAR_BIT);
    // ... corresponds to bit i of the value.
    if (std::bit_cast<T>(data) != T(1) << bit)
      return false;
  }
  return true;
}
static_assert(is_little_endian<int>());

(Note that C++20 guarantees two's complement integers -- with an unspecified bit order -- so we just need to check that every bit of the data maps to the expected place in the integer.)

But if you have a C++20 standard library, you can also just ask it:

#include <type_traits>
constexpr bool is_little_endian = std::endian::native == std::endian::little;

回答4:

That is a very interesting question.

I am not Language Lawyer, but you might be able to replace the reinterpret_cast with a union.

const union {
    int int_value;
    char char_value[4];
} Endian = { 0xAABBCCDD };

constexpr bool little_endian()
{
   return Endian[0] == 0xDD;
}

回答5:

My first post. Just wanted to share some code that I'm using.

//Some handy defines magic, thanks overflow
#define IS_LITTLE_ENDIAN  ('ABCD'==0x41424344UL) //41 42 43 44 = 'ABCD' hex ASCII code
#define IS_BIG_ENDIAN     ('ABCD'==0x44434241UL) //44 43 42 41 = 'DCBA' hex ASCII code
#define IS_UNKNOWN_ENDIAN (IS_LITTLE_ENDIAN == IS_BIG_ENDIAN)

//Next in code...
struct Quad
{
    union
    {
#if IS_LITTLE_ENDIAN
        struct { std::uint8_t b0, b1, b2, b3; };

#elif IS_BIG_ENDIAN
        struct { std::uint8_t b3, b2, b1, b0; };

#elif IS_UNKNOWN_ENDIAN
#error "Endianness not implemented!"
#endif

        std::uint32_t dword;
    };
};

Constexpr version:

namespace Endian
{
    namespace Impl //Private
    {
        //41 42 43 44 = 'ABCD' hex ASCII code
        static constexpr std::uint32_t LITTLE_{ 0x41424344u };

        //44 43 42 41 = 'DCBA' hex ASCII code
        static constexpr std::uint32_t BIG_{ 0x44434241u };

        //Converts chars to uint32 on current platform
        static constexpr std::uint32_t NATIVE_{ 'ABCD' };
    }



    //Public
    enum class Type : size_t { UNKNOWN, LITTLE, BIG };

    //Compare
    static constexpr bool IS_LITTLE   = Impl::NATIVE_ == Impl::LITTLE_;
    static constexpr bool IS_BIG      = Impl::NATIVE_ == Impl::BIG_;
    static constexpr bool IS_UNKNOWN  = IS_LITTLE == IS_BIG;

    //Endian type on current platform
    static constexpr Type NATIVE_TYPE = IS_LITTLE ? Type::LITTLE : IS_BIG ? Type::BIG : Type::UNKNOWN;



    //Uncomment for test. 
    //static_assert(!IS_LITTLE, "This platform has little endian.");
    //static_assert(!IS_BIG_ENDIAN, "This platform has big endian.");
    //static_assert(!IS_UNKNOWN, "Error: Unsupported endian!");
}

回答6:

There is std::endian in the upcoming C++20.

#include <type_traits>

constexpr bool little_endian() noexcept
{
    return std::endian::native == std::endian::little;
}

回答7:

This may seem like cheating, but you can always include endian.h... BYTE_ORDER == BIG_ENDIAN is a valid constexpr...

回答8:

If your goal is to insure that the compiler optimizes little_endian() into a constant true or false at compile-time, without any of its contents winding up in the executable or being executed at runtime, and only generating code from the "correct" one of your two Foo templates, I fear you're in for a disappointment.

I also am not a language lawyer, but it looks to me like constexpr is like inline or register: a keyword that alerts the compiler writer to the presence of a potential optimization. Then it's up to the compiler writer whether or not to take advantage of that. Language specs typically mandate behaviors, not optimizations.

Also, have you actually tried this on a variety of C++0x complaint compilers to see what happens? I would guess most of them would choke on your dual templates, since they won't be able to figure out which one to use if invoked with false.

来源：https://stackoverflow.com/questions/1583791/constexpr-and-endianness

标签

c++

c++11

endianness