问题
And also for char8_t?
I assume there is some C++20 decision, somewhere, but I could not find it.
There is also P1428, but that doc is not mentioning anything about printf() family v.s. char8_t * or char8_t.
Use std::cout advice might be an answer. Unfortunately, that does not compile anymore.
// does not compile under C++20
// error : overload resolution selected deleted operator '<<'
// see P1423, proposal 7
std::cout << u8"A2";
std::cout << char8_t ('A');
For C 2.x and char8_t
Please start from here.
Update
I have done some more tests with a single element from a u8 sequence.
And that indeed does not work. char8_t * to printf("%s") does work, but char8_t to printf("%c") is an accident waiting to happen.
Please see -- https://wandbox.org/permlink/6NQtkKeZ9JUFw4Sd -- Problem is, as per the current status quo, char8_t is not implemented, char8_t * is. -- let me repeat: there is no implemented type to hold a single element from a char8_t * sequence.
If you want a single u8 glyph you need to code it as an u8 string
char8_t const * single_glyph = u8"ア";
And it seems at present, to print the above the sort of a sure way is
// works with warnings
std::printf("%s", single_glyph ) ;
To start reading on this subject, probably these two papers are required
- http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2231.htm
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html
In that order.
My primary DEVENV is VisualStudio 2019, with both MSVC and CLANG 8.0.1, as delivered with VS. With std:c++latest. Dev machine is WIN10 [Version 10.0.18362.476]
回答1:
I'm the author of the char8_t P0482 and P1423 proposals for C++ and the N2231 proposal for C (that has not yet been accepted).
Let's think about what the following should do:
printf("Hello %s\n", u8"Jöel");
std::cout << "Hello " << u8"Jöel" << "\n";
Actually, let's take a further step back. What encoding is expected on the receiver side of standard output? There are a few possibilities. If standard out is connected to a console/terminal, then the expected encoding is the one that the console/terminal is configured for. On a Windows system in the United States, this is likely to be CP437. On a UNIX/Linux system, this is likely UTF-8. On a z/OS system in the United States, this is likely EBCDIC code page 037. If standard out has been redirected, then the expected encoding is likely locale dependent. On a Windows system in the United States, that would mean the Active Code Page (ACP), likely Windows 1252. On UNIX/Linux and z/OS, it would likely be the same as the console/terminal (Windows is the odd system here that has different defaults for console encoding vs locale encoding).
Back to that example code. What is the expected or desired behavior for that UTF-8 encoded ö character (U+00F6, {LATIN SMALL LETTER O WITH DIAERESIS}, encoded as 0xC3 0xB6)? For Windows writing to the console, for the character to display properly, the encoded sequence would need to be transcoded to 0x94 while for Windows where locale dependent output is expected, it would need to be transcoded to 0xF6. For UNIX/Linux, the sequence should probably be passed through. For z/OS, it may need to be transcoded to 0xCC. But on all of these systems, these defaults are configurable (e.g., via the LANG environment variable).
Assuming that transcoding to a run-time determined encoding is the desired behavior, how should transcoding errors be handled? For example, what should happen if the target encoding lacks representation for ö? What if an ill-formed UTF-8 sequence is present? Should printf stop and report an error? Should std::cout throw an exception? Or should an implementation defined character such as U+FFFD {REPLACEMENT CHARACTER} or ? be substituted?
What should happen if std::cout is imbued with a std::codecvt facet? Presumably that facet will expect incoming text to be in a particular encoding. Should UTF-8 text be transcoded to one of the execution character set, the locale dependent encoding, or the console/terminal encoding before being presented to the facet? If so, which one? Should the implementation have to be aware of whether the stream is connected to a console/terminal? What if the programmer wants to override the default and, for example, always write UTF-8?
These are rather difficult questions that we don't have good answers for. std::u8out has been suggested, as a way to explicitly opt-in to UTF-8, but doesn't solve the problems of expected standard output encoding, issues with codecvt facets, and other iostreams problems like implicit locale dependent formatting.
Personally, in order to provide good Unicode support going forward, I think we're going to have to invest in a replacement for iostreams that 1) provides byte output with text support layered on top, 2) is encoding aware (in the text layer), 3) is locale independent (but with explicit opt-in support for locale dependent formatting like that provided by std::format), 4) is more performant than iostreams.
SG16 would like to hear your thoughts and suggestions. See https://github.com/sg16-unicode/sg16 for contact information.
回答2:
printf is not defined by C++20 itself; C++20 includes the C standard library by reference. It will likely reference C18, but that's substantially equal to C11 (no new features; just fixes defect reports).
回答3:
Use std::cout advice might be an answer. Unfortunately, that does not compile anymore.
For me it compiles well (I tested on experimental GCC 10.0.0 on Wandbox) but does not print what you might expect/want.
I have read this SO answer that states that char8_t is implemented the same way as an unsigned char despite they are not the same type (this is not a typedef of unsigned char).
Knowing this, you could write something like this overload:
#include <iostream>
std::ostream & operator<<(std::ostream & os, const char8_t & c8)
{
return os << static_cast<unsigned char>(c8);
}
Then you should be able to write something like:
char8_t a = 'u';
std::cout << a << std::endl;
And it will output:
u
instead of
117
I did the test here.
I think you should be able to do something equivalent for char8_t * (edit: example here).
Please let me know if I did not catch your point.
来源:https://stackoverflow.com/questions/58878651/what-is-the-printf-formatting-character-for-char8-t