Displaying extended ASCII characters

前端 未结 4 544
孤城傲影
孤城傲影 2020-12-29 00:37

In Visual Studio 2005 on 32-bit Windows, why doesn\'t my console display characters from 128 to 255?

for example:

cout << \"¿\" << endl;          


        
4条回答
  •  借酒劲吻你
    2020-12-29 01:00

    A Windows console window is pure Unicode. Its buffer stores text as UCS-2 Unicode (16 bits per character, essentially like original Unicode, a restriction to the Basic Multilingual Plane of modern 21-bit Unicode). So a console window can present almost all kinds of text.

    However, for single byte per character (and possibly also for some variable length encodings) i/o Windows automatically translates to/from the console window's active codepage. If the console window is a [cmd.exe] instance then you can inspect that via command chcp, short for change codepage. Like this:

    C:\test> chcp
    Active code page: 850
    
    C:\test> _
    

    Codepage 850 is an encoding based on the original IBM PC English codepage 437. 850 is default for console windows on at least Norwegian PC's (although savvy Norwegians may change that to 865). None of those are codepages that you should use, however.

    The original IBM PC codepage (character encoding) is known as OEM, which is a meaningless acronym, Original Equipment Manufacturer. It had nice line drawing characters suitable for the original PC's text mode screen. More generally OEM means the default code page for console windows, where codepage 437 is just the original one: it can be configured, e.g. per window via chcp.

    When Microsoft created 16-bit Windows they chose another encoding known in Windows as ANSI. The original one was an extension of ISO Latin-1 which for a long while was the default on the Internet (however, it's unclear which came first: Microsoft participated in the standardization). This original ANSI is now known as Windows ANSI Western.

    ANSI is the code page used for non-Unicode by almost all the rest of Windows. Console windows use OEM. Notepad, other editors, and so on, use ANSI.

    Then, when Microsoft made Windows 32-bit, they adopted a 16-bit extension of Latin-1 known as Unicode. Microsoft was an original founding member of the Unicode Consortium. And the basic API, including console windows, the file system, etc., was rewritten to use Unicode. For backward compatibility there is a translation layer that translates between OEM and Unicode for console windows, and between ANSI and Unicode for other functionality. For example, MessageBoxA is an ANSI wrapper for Unicode-based MessageBoxW.

    The practical upshot of that is that in Windows your C++ source code is typically encoded with ANSI, while console windows assume OEM. Which e.g. makes

    cout << "I like Norwegian blåbærsyltetøy!" << endl;
    

    produce pure gobbledegook… You can use the Unicode-based console window APIs to output Unicode directly to a console window, avoiding the translation, but that's awkward.

    Note that using wcout instead of cout doesn't help: by design wcout just translates down from wide character strings to the program's narrow character set, discarding information on the way. It can be hard to believe, that the C++ standard library offers a rather big chunk of very very complex functionality that is meaningless (since instead those conversions could just have been supported by cout). But so it is, just meaningless. Possibly it was some political-like compromise, but anyway, wcout does not help, even though if it were meaningful in some way then it "should" logically help with this.

    So how does a Norwegian novice programmer get e.g. "blåbærsyltetøy" presented?

    Well, simply by changing the active code page to ANSI. Since on most Western country PCs ANSI is codepage 1252, you can do that for a given command interpreter instance by

    C:\test> chcp 1252
    Active code page: 1252
    
    C:\test> _
    

    Now old DOS programs like e.g. [edit.com] (still present in Windows XP!) will produce some gobbledegook, because the original PC character set line drawing characters are not there in ANSI, and because national characters have different codes in ANSI. But hey, who uses old DOS programs? Not me!

    If you want this as a more permanent code page, you'll have to change the configuration of console windows via an undocumented registry key:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage

    In this key, change value of OEMCP to 1252, and reboot.

    As with chcp, or other change of codepage to 1252, makes old DOS programs present gobbledegook, but makes C++ programs or other modern console programs work OK.

    Since you then have same character encoding in console windows as in the rest of Windows.

提交回复
热议问题