git diff shows unicode symbols in angle brackets

点点圈 提交于 2019-11-29 00:57:09

问题


I have a file with unicode symbols (russian text). When I fix some typo I use git diff --color-words=. to see the changes I've done.

In case of unicode (cyrillic) symbols I get some mess with angle brackets like so:

$ cat p1
привет

$ cat p2
Привет

$ git diff --color-words=. --no-index p1 p2
diff --git 1/p1 2/p2
index d0f56e1..d84c480 100644
--- 1/p1
+++ 2/p2
@@ -1 +1 @@
<D0><BF><9F>ривет

It looks like git diff --color-words=. is checking the difference between bytes and not between symbols as I expect.

Is there any way to tell git to work properly with unicode symbols?

UPD about my environment: I get the same on Mac OS and on Linux host.

My shell vars are:

BASH=/bin/bash
HOSTTYPE=x86_64
LANG=ru_RU.UTF-8
OSTYPE=darwin10.0
PS1='\h:\W \u\$ '
SHELL=/bin/bash
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
TERM=xterm-256color
TERM_PROGRAM=iTerm.app
_=-l

I have reset git config to default settings like so:

$ git config -l
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
core.ignorecase=true

git version

$ git --version
git version 1.7.3.5

回答1:


For me less — the git pager — was to blame (thanks @kostix). Experiment by disabling the pager altogether:

git --no-pager diff p1 p2

My case was commit messages containing emojis; it's fundamentally the same problem though.

$ git log --oneline
93a1866 <U+1F43C>

$ git --no-pager log --oneline
93a1866 🐼

$ export LESS='--raw-control-chars'
$ git log --oneline
93a1866 🐼

$ git config --global core.pager 'less --raw-control-chars'
$ git log --oneline
93a1866 🐼

NB: the --RAW-CONTROL-CHARS option causes less to pass through ANSI color escapes, but will still munge other control chars (emoji included). My less is globally configured with --RAW-CONTROL-CHARS and my git pager with --raw-control-chars as above.




回答2:


For me best solution to this is setting export LESSCHARSET=utf-8.

In this case both git log -p and git diff shows unicode without problems.




回答3:


The solution for me was to use git difftool.

I wrote this tool https://github.com/chestozo/dmp based on https://code.google.com/p/google-diff-match-patch/.

Sometimes it also gives better diff comparing to git diff --color-words=. :)




回答4:


For several platforms setting LANG to C.UTF-8 (or en_US.UTF-8, etc.) would work:

$ echo '人' >test1.txt && echo '丁' >test2.txt
$ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
[-人-]{+丁+}

However, LANG doesn't seem to be honored on some platforms (such as Git for Windows):

$ echo '人' >test1.txt && echo '丁' >test2.txt
$ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
<E4>[-<BA><BA>-]{+<B8><81>+}

A workaround on these platforms is to provide raw bytes for UTF-8 chars (e.g. $'[^\x80-\xBF][\x80-\xBF]*' for '.') to git diff:

$ echo '人' >test1.txt && echo '丁' >test2.txt
$ git diff --no-index --word-diff=plain --word-diff-regex=$'[^\x80-\xBF][\x80-\xBF]*' -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
[-人-]{+丁+}



回答5:


I have seen a lot of reports xterm is not really able to print Unicode characters in some cases. Maybe at least a starting point for a solution.



来源:https://stackoverflow.com/questions/17320721/git-diff-shows-unicode-symbols-in-angle-brackets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!