Why is csc.exe crashing when I last left the output encoding as UTF8?

Deadly 提交于 2019-11-29 14:46:21

Well, you found a bug in the way the C# compiler deals with having to output text to the console when it is switched to UTF-8. It has a self-diagnostic to ensure the conversion from an UTF-16 encoded string to the console output code page worked correctly, it slams the Big Red Button when it didn't. The stack trace looks like this:

csc.exe!OnCriticalInternalError()  + 0x4 bytes  
csc.exe!ConsoleOutput::WideToConsole()  + 0xdc51 bytes  
csc.exe!ConsoleOutput::print_internal()  + 0x2c bytes   
csc.exe!ConsoleOutput::print()  + 0x80 bytes    
csc.exe!ConsoleOutput::PrintString()  + 0xb5 bytes  
csc.exe!ConsoleOutput::PrintBanner()  + 0x50 bytes  
csc.exe!_main()  + 0x2d0eb bytes    

The actual code for WideToConsole() is not available, the closest match is this version from the SSCLI20 distribution:

/*
 * Like WideCharToMultiByte, but translates to the console code page. Returns length,
 * INCLUDING null terminator.
 */
int ConsoleOutput::WideCharToConsole(LPCWSTR wideStr, LPSTR lpBuffer, int nBufferMax)
{
    if (m_fUTF8Output) {
        if (nBufferMax == 0) {
            return UTF8LengthOfUnicode(wideStr, (int)wcslen(wideStr)) + 1; // +1 for nul terminator
        }
        else {
            int cchConverted = NULL_TERMINATED_MODE;
            return UnicodeToUTF8 (wideStr, &cchConverted, lpBuffer, nBufferMax);
        }

    }
    else {
        return WideCharToMultiByte(GetConsoleOutputCP(), 0, wideStr, -1, lpBuffer, nBufferMax, 0, 0);
    }
}

/*
 * Convert Unicode string to Console ANSI string allocated with VSAlloc
 */
HRESULT ConsoleOutput::WideToConsole(LPCWSTR wideStr, CAllocBuffer &buffer)
{
    int cch = WideCharToConsole(wideStr, NULL, 0);
    buffer.AllocCount(cch);
    if (0 == WideCharToConsole(wideStr, buffer.GetData(), cch)) {
        VSFAIL("How'd the string size change?");
        // We have to NULL terminate the output because WideCharToMultiByte didn't
        buffer.SetAt(0, '\0');
        return E_FAIL;
    }
    return S_OK;
}

The crash occurs somewhere around the VSFAIL() assert, judging from the machine code. I can see the return E_FAIL statement. It was however changed from the version I posted, the if() statement was modified and it looks like VSFAIL() was replaced by RETAILVERIFY(). Something broke when they made those changes, probably in UnicodeToUTF8() which is now named UTF16ToUTF8(). Re-emphasizing, the version I posted does not in fact crash, you can see for yourself by running C:\Windows\Microsoft.NET\Framework\v2.0.50727\csc.exe. Only the v4 version of csc.exe has this bug.

The actual bug is hard to dig out from the machine code, best to let Microsoft worry about that. You can file the bug at connect.microsoft.com. I don't see a report that resembles it, fairly remarkable btw. The workaround for this bug is to use CHCP to change the codepage back.

sstan

There are different articles out there that hint to the fact that the Windows Console has many Unicode-related bugs. Articles such as: https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

Here is one workaround that works for me. Instead of:

csc aaa1.cs

Try this (which will redirect the CSC output to a file):

csc /utf8output aaa1.cs > aaa1-compilation.log

Relevant documentation: https://msdn.microsoft.com/en-us/library/d5bxd1x2.aspx

In some international configurations, compiler output cannot correctly be displayed in the console. In these configurations, use /utf8output and redirect compiler output to a file.

added by barlop

looking at chat, we have found that doing csc uuu1.cs<ENTER> uuu1<ENTER> then to prevent crashing, every csc to come has to be done with /utf8output AND (for some odd unknown reason),bizarrely, with a redirect.. so, csc /utf8output uuu1.cs >asdfsdaf

Han's workaround is better though, just run chcp 850 (or whatever codepage you use) after the uuu1<ENTER> even if chcp says it's 850, you still have to do chcp 850. Then csc will run normally.

The reason why, when having an issue, you should run chcp 850 even if chcp is showing 850, is because chcp will only show you the input encoding, though chcp 850 will change both the input encoding and the output encoding, and we want the output encoding change. So chcp could show 850 even when your output encoding is 65001, and the issue is only when the output encoding is 65001

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!