Python, windows console and encodings (cp 850 vs cp1252)

后端 未结 2 1504
忘掉有多难
忘掉有多难 2020-12-13 11:00

I thought I knew everything about encodings and Python, but today I came across a weird problem: although the console is set to code page 850 - and Python reports it correct

相关标签:
2条回答
  • 2020-12-13 11:27

    Replying to myself:

    On Windows, the encoding used by the console (thus, that of sys.stdin/out) differs from the encoding of various OS-provided strings - obtained through e.g. os.getenv(), sys.argv, and certainly many more.

    The encoding provided by sys.getdefaultencoding() is really that - a default, chosen by Python developers to match the "most reasonable encoding" the interpreter use in extreme cases. I get 'ascii' on my Python 2.6, and tried with portable Python 3.1, which yields 'utf-8'. Both are not what we are looking for - they are merely fallbacks for encoding conversion functions.

    As this page seems to state, the encoding used by OS-provided strings is governed by the Active Code Page (ACP). Since Python does not have a native function to retrieve it, I had to use ctypes:

    from ctypes import cdll
    os_encoding = 'cp' + str(cdll.kernel32.GetACP())
    

    Edit: But as Jacek suggests, there actually is a more robust and Pythonic way to do it (semantics would need validation, but until proven wrong, I'll use this)

    import locale
    os_encoding = locale.getpreferredencoding()
    # This returns 'cp1252' on my system, yay!
    

    and then

    u_argv = [x.decode(os_encoding) for x in sys.argv]
    u_env = os.getenv('myvar').decode(os_encoding)
    

    On my system, os_encoding = 'cp1252', so it works. I am quite certain this would break on other platforms, so feel free to edit and make it more generic. We would certainly need some kind of translation table between the ACP reported by Windows and the Python encoding name - something better than just prepending 'cp'.

    This is a unfortunately a hack, although I find it a bit less intrusive than the one suggested by this ActiveState Code Recipe (linked to by the SO question mentioned in Edit 2 of my question). The advantage I see here is that this can be applied to os.getenv(), and not only to sys.argv.

    0 讨论(0)
  • 2020-12-13 11:33

    I tried the solutions. It may still have some encoding problems. We need to use true type fonts. Fix:

    1. Run chcp 65001 in cmd to change the encoding to UTF-8.
    2. Change cmd font to a True-Type one like Lucida Console that supports the preceding code pages before 65001

    Here's my complete fix for the encoding error:

    def fixCodePage():
        import sys
        import codecs
        import ctypes
        if sys.platform == 'win32':
            if sys.stdout.encoding != 'cp65001':
                os.system("echo off")
                os.system("chcp 65001") # Change active page code
                sys.stdout.write("\x1b[A") # Removes the output of chcp command
                sys.stdout.flush()
            LF_FACESIZE = 32
            STD_OUTPUT_HANDLE = -11
            class COORD(ctypes.Structure):
            _fields_ = [("X", ctypes.c_short), ("Y", ctypes.c_short)]
    
            class CONSOLE_FONT_INFOEX(ctypes.Structure):
                _fields_ = [("cbSize", ctypes.c_ulong),
                ("nFont", ctypes.c_ulong),
                ("dwFontSize", COORD),
                ("FontFamily", ctypes.c_uint),
                ("FontWeight", ctypes.c_uint),
                ("FaceName", ctypes.c_wchar * LF_FACESIZE)]
    
            font = CONSOLE_FONT_INFOEX()
            font.cbSize = ctypes.sizeof(CONSOLE_FONT_INFOEX)
            font.nFont = 12
            font.dwFontSize.X = 7
            font.dwFontSize.Y = 12
            font.FontFamily = 54
            font.FontWeight = 400
            font.FaceName = "Lucida Console"
            handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
            ctypes.windll.kernel32.SetCurrentConsoleFontEx(handle, ctypes.c_long(False), ctypes.pointer(font))
    

    Note: You can see a font change while executing the program.

    0 讨论(0)
提交回复
热议问题