C/C++ Why to use unsigned char for binary data?

后端 未结 8 1068
-上瘾入骨i
-上瘾入骨i 2020-12-07 10:27

Is it really necessary to use unsigned char to hold binary data as in some libraries which work on character encoding or binary buffers? To make sense of my que

相关标签:
8条回答
  • 2020-12-07 11:13

    I am asking why something which seems to be working as fine with char should be typed unsigned char?

    If you do things which are not "correct" in the sense of the standard, you rely on undefined behaviour. Your compiler might do it the way you want today, but you don't know what it does tomorrow. You don't know what GCC does or VC++ 2012. Or even if the behaviour depends on external factors or Debug/Release compiles etc. As soon as you leave the safe path of the standard, you might run into trouble.

    0 讨论(0)
  • 2020-12-07 11:19

    The plain char type is problematic and shouldn't be used for anything but strings. The main problem with char is that you can't know whether it is signed or unsigned: this is implementation-defined behavior. This makes char different from int etc, int is always guaranteed to be signed.

    Although VC gave the warning ... truncation of constant value

    It is telling you that you are trying to store int literals inside char variables. This might be related to the signedness: if you try to store an integer with value > 0x7F inside a signed character, unexpected things might happen. Formally, this is undefined behavior in C, though practically you'd just get a weird output if attempting to print the result as an integer value stored inside a (signed) char.

    In this specific case, the warning shouldn't matter.

    EDIT :

    In other related questions unsigned char is highlighted because it is the only (byte/smallest) data type which is guaranteed to have no padding by the C-specification.

    In theory, all integer types except unsigned char and signed char are allowed to contain "padding bits", as per C11 6.2.6.2:

    "For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)."

    "For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits."

    The C standard is intentionally vague and fuzzy, allowing these theoretical padding bits because:

    • It allows different symbol tables than the standard 8-bit ones.
    • It allows implementation-defined signedness and weird signed integer formats such as one's complement or "sign and magnitude".
    • An integer may not necessarily use all bits allocated.

    However, in the real world outside the C standard, the following applies:

    • Symbol tables are almost certainly 8 bits (UTF8 or ASCII). Some weird exceptions exist, but clean implementations use the standard type wchar_t when implementing symbols tables larger than 8 bits.
    • Signedness is always two's complement.
    • An integer always uses all bits allocated.

    So there is no real reason to use unsigned char or signed char just to dodge some theoretical scenario in the C standard.

    0 讨论(0)
提交回复
热议问题