I have a file with unicode symbols (russian text).
When I fix some typo I use git diff --color-words=.
to see the changes I\'ve done.
In case of unicode
For me best solution to this is setting export LESSCHARSET=utf-8
.
In this case both git log -p
and git diff
shows unicode without problems.
The solution for me was to use git difftool.
I wrote this tool https://github.com/chestozo/dmp based on https://code.google.com/p/google-diff-match-patch/.
Sometimes it also gives better diff comparing to git diff --color-words=.
:)
I have seen a lot of reports xterm is not really able to print Unicode characters in some cases. Maybe at least a starting point for a solution.
For me less
— the git pager — was to blame (thanks @kostix). Experiment by disabling the pager altogether:
git --no-pager diff p1 p2
My case was commit messages containing emojis; it's fundamentally the same problem though.
$ git log --oneline
93a1866 <U+1F43C>
$ git --no-pager log --oneline
93a1866
For several platforms setting LANG
to C.UTF-8
(or en_US.UTF-8
, etc.) would work:
$ echo '人' >test1.txt && echo '丁' >test2.txt
$ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
[-人-]{+丁+}
However, LANG
doesn't seem to be honored on some platforms (such as Git for Windows):
$ echo '人' >test1.txt && echo '丁' >test2.txt
$ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
<E4>[-<BA><BA>-]{+<B8><81>+}
A workaround on these platforms is to provide raw bytes for UTF-8 chars (e.g. $'[^\x80-\xBF][\x80-\xBF]*'
for '.'
) to git diff:
$ echo '人' >test1.txt && echo '丁' >test2.txt
$ git diff --no-index --word-diff=plain --word-diff-regex=$'[^\x80-\xBF][\x80-\xBF]*' -- test1.txt test2.txt
diff --git a/test1.txt b/test2.txt
index 3ef0891..3773917 100644
--- a/test1.txt
+++ b/test2.txt
@@ -1 +1 @@
[-人-]{+丁+}