I\'m having some trouble getting unicode to work for git-bash (on windows 7). I have tried many things without success. Although, I\'m not quite sure what is responsible to
The problem with chcp 65001 is that there are bugs in the C runtime (MSVCRT) that make stdio calls return inconsistent results when run under code page 65001.
That should be better with Git 2.23 (Q3 2019)
See commit 090d1e8 (03 Jul 2019) by Karsten Blees (kblees).
(Merged by Junio C Hamano -- gitster -- in commit 0328db0, 11 Jul 2019)
gettext: always use UTF-8 on native Windows
On native Windows, Git exclusively uses UTF-8 for console output (both with MinTTY and native Win32 Console).
Gettext uses
setlocale()to determine the output encoding for translated text, however, MSVCRT'ssetlocale()does not support UTF-8. As a result, translated text is encoded in system encoding (as perGetAPC()), and non-ASCII chars are mangled in console output.Side note: There is actually a code page for UTF-8: 65001.
In practice, it does not work as expected at least on Windows 7, though, so we cannot use it in Git. Besides, if we overrode the code page, any process spawned from Git would inherit that code page (as opposed to the code page configured for the current user), which would quite possibly break e.g. diff or merge helpers. So we really cannot override the code page.In
init_gettext_charset(), Git calls gettext'sbind_textdomain_codeset()with the character set obtained vialocale_charset(); Let's override that latter function to force the encoding to UTF-8 on native Windows.In Git for Windows' SDK, there is a
libcharset.hand therefore we defineHAVE_LIBCHARSET_Hin the MINGW-specific section inconfig.mak.uname, therefore we need to add the override before that conditionally-compiled code block.Rather than simply defining
locale_charset()to return the string"UTF-8", though, we are careful not to breakLC_ALL=C: theab/no-kwsetpatch series, for example, needs to have a way to prevent Git from expecting UTF-8-encoded input.
And:
See commit 697bdd2 (04 Jul 2019), and commit 9423885, commit 39a98e9 (27 Jun 2019) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 0a2ff7c, 11 Jul 2019)
mingw: use Unicode functions explicitlyMany Win32 API functions actually exist in two variants: one with the
Asuffix that takes ANSI parameters (char *orconst char *) and one with theWsuffix that takes Unicode parameters (wchar_t *orconst wchar_t *).The ANSI variant assumes that the strings are encoded according to whatever is the current locale.
This is not what Git wants to use on Windows: we assume thatchar *variables point to strings encoded in UTF-8.There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale.
Further, it is actually highly encouraged to use the Unicode versions instead of the ANSI versions, so let's do precisely that.
Note: when calling the Win32 API functions without any suffix, it depends whether the
UNICODEconstant is defined before the relevant headers are #include'd.
Without that constant, the ANSI variants are used.
Let's be explicit and avoid that ambiguity.