UTF-8 can encode in 1, 2, and up to 4 bytes. A single char on my system is 1 byte. Should I use wchar_t as a precaution so that I will be able to f
No, you should not! The Unicode 4.0 standard (ISO 10646:2003) notes that:
The width of
wchar_tis compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not usewchar_tfor storing Unicode text.
Under most circumstances, the "character nature" of UTF-8 text will not be relevant to your program, so treating it as an array of char elements, just like any other string, will be sufficient. If you need to extract individual characters, though, those characters should be stored in a type that is at least 24 bits wide (e.g, uint32_t), in order to accomodate all Unicode code points.