In standard C++ we have char and wchar_t for storing characters. char can store values between 0x00 and 0xFF. And
In standard C++ we have char and wchar_t for storing characters? char can store values between 0x00 and 0xFF. And wchar_t can store values between 0x0000 and 0xFFFF
Not quite:
sizeof(char) == 1 so 1 byte per character.
sizeof(wchar_t) == ? Depends on your system
(for unix usually 4 for Windows usually 2).
Unicode characters consume up to 4-byte space.
Not quite. Unicode is not an encoding. Unicode is a standard the defines what each code point is and the code points are restricted to 21 bits. The first 16 bits defined the character position on a code plain while the following 5 bits defines which plain the character is on.
There are several unicode encodings (UTF-8, UTF-16 and UTF-32 being the most common) this is how you store the characters in memory. There are practical differences between the three.
UTF-8: Great for storage and transport (as it is compact)
Bad because it is variable length
UTF-16: Horrible in nearly all regards
It is always large and it is variable length
(anything not on the BMP needs to be encoded as surrogate pairs)
UTF-32: Great for in memory representations as it is fixed size
Bad because it takes 4 bytes for each character which is usually overkill
Personally I use UTF-8 for transport and storage and UTF-32 for in memory representation of text.