Git and the Umlaut problem on Mac OS X

て烟熏妆下的殇ゞ 提交于 2019-11-26 05:59:54

问题


Today I discovered a bug for Git on Mac OS X.

For example, I will commit a file with the name überschrift.txt with the German special character Ü at the beginning. From the command git status I get following output.

Users-iMac: user$ git status

On branch master
# Untracked files:
#   (use \"git add <file>...\" to include in what will be committed)
#
#   \"U\\314\\210berschrift.txt\"
nothing added to commit but untracked files present (use \"git add\" to track)

It seems that Git 1.7.2 has a problem with German special characters on Mac OS X. Is there a solution to get Git read the file names correct?


回答1:


Enable core.precomposeunicode on the mac

git config --global core.precomposeunicode true

For this to work, you need to have at least Git 1.8.2.

Mountain Lion ships with 1.7.5. To get a newer git either use git-osx-installer or homebrew (requires Xcode).

That's it.




回答2:


The cause is the different implementation of how the filesystem stores the file name.

In Unicode, Ü can be represented in two ways, one is by Ü alone, the other is by U + "combining umlaut character". A Unicode string can contain both forms, but as it's confusing to have both, the file system normalizes the unicode string by setting every umlauted-U to Ü, or U + "combining umlaut character".

Linux uses the former method, called Normal-Form-Composed (or NFC), and Mac OS X uses the latter method, called Normal-Form-Decomposed (NFD).

Apparently Git doesn't care about this point and simply uses the byte sequence of the filename, which leads to the problem you're having.

The mailing list thread Git, Mac OS X and German special characters has a patch in it so that Git compares the file names after normalization.




回答3:


The following put in ~/.gitconfig works for me on 10.12.1 Sierra for UTF-8 names:

precomposeunicode = true
quotepath = false

The first option is needed so that git 'understands' UTF-8 and the second one so that it doesn't escape the characters.




回答4:


To make git add file work with umlauts in file names on Mac OS X, you may convert file path strings from composed into canonically decomposed UTF-8 using iconv.

# test case

mkdir testproject
cd testproject

git --version    # git version 1.7.6.1
locale charmap   # UTF-8

git init
file=$'\303\234berschrift.txt'    # composed UTF-8 (Linux-compatible)
touch "$file"
echo 'Hello, world!' > "$file"

# convert composed into canonically decomposed UTF-8
# cf. http://codesnippets.joyent.com/posts/show/12251
# printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac | LC_ALL=C vis -fotc 
#git add "$file"
git add "$(printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac)"  

git commit -a -m 'This is my commit message!'
git show
git status
git ls-files '*'
git ls-files -z '*' | tr '\0' '\n'

touch $'caf\303\251 1' $'caf\303\251 2' $'caf\303\251 3'
git ls-files --other '*'
git ls-files -z --other '*' | tr '\0' '\n'



回答5:


Change the repository's OSX-specific core.precomposeunicode flag to true:

git config core.precomposeunicode.true

To make sure new repositories get that flag, also run:

git config --global core.precomposeunicode true

Here is the relevant snippet from the manpage:

This option is only used by Mac OS implementation of Git. When core.precomposeunicode=true, Git reverts the unicode decomposition of filenames done by Mac OS. This is useful when sharing a repository between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7). When false, file names are handled fully transparent by Git, which is backward compatible with older versions of Git.




回答6:


It is correct.

Your filename is in UTF-8, Ü being represented as LATIN CAPITAL LETTER U + COMBINING DIAERESIS (Unicode 0x0308, utf8 0xcc 0x88) instead of LATIN CAPITAL LETTER U WITH DIAERESIS (Unicode 0x00dc, utf8 0xc3 0x9c). The Mac OS X HFS file system decomposes Unicode in a such way. Git in turn shows the octal-escape form of the non-ASCII filename bytes.

Note that Unicode filenames can make your repository non-portable. For example, msysgit has had problems dealing with Unicode filenames.




回答7:


I had similar problem with my personal repository, so I wrote a helper script with Python 3. You can grap it here: https://github.com/sjtoik/umlaut-cleaner

The script needs a bit of manual labour, but not much.



来源:https://stackoverflow.com/questions/5581857/git-and-the-umlaut-problem-on-mac-os-x

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!