git diff shows unicode symbols in angle brackets

后端 未结 5 1426
情话喂你
情话喂你 2020-12-14 02:39

I have a file with unicode symbols (russian text). When I fix some typo I use git diff --color-words=. to see the changes I\'ve done.

In case of unicode

相关标签:
5条回答
  • 2020-12-14 03:07

    For me best solution to this is setting export LESSCHARSET=utf-8.

    In this case both git log -p and git diff shows unicode without problems.

    0 讨论(0)
  • 2020-12-14 03:07

    The solution for me was to use git difftool.

    I wrote this tool https://github.com/chestozo/dmp based on https://code.google.com/p/google-diff-match-patch/.

    Sometimes it also gives better diff comparing to git diff --color-words=. :)

    0 讨论(0)
  • I have seen a lot of reports xterm is not really able to print Unicode characters in some cases. Maybe at least a starting point for a solution.

    0 讨论(0)
  • 2020-12-14 03:13

    For me less — the git pager — was to blame (thanks @kostix). Experiment by disabling the pager altogether:

    git --no-pager diff p1 p2
    

    My case was commit messages containing emojis; it's fundamentally the same problem though.

    $ git log --oneline
    93a1866 <U+1F43C>
    
    $ git --no-pager log --oneline
    93a1866                                                                     
    0 讨论(0)
  • 2020-12-14 03:16

    For several platforms setting LANG to C.UTF-8 (or en_US.UTF-8, etc.) would work:

    $ echo '人' >test1.txt && echo '丁' >test2.txt
    $ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
    diff --git a/test1.txt b/test2.txt
    index 3ef0891..3773917 100644
    --- a/test1.txt
    +++ b/test2.txt
    @@ -1 +1 @@
    [-人-]{+丁+}
    

    However, LANG doesn't seem to be honored on some platforms (such as Git for Windows):

    $ echo '人' >test1.txt && echo '丁' >test2.txt
    $ LANG=C.UTF-8 git diff --no-index --word-diff=plain --word-diff-regex=. -- test1.txt test2.txt
    diff --git a/test1.txt b/test2.txt
    index 3ef0891..3773917 100644
    --- a/test1.txt
    +++ b/test2.txt
    @@ -1 +1 @@
    <E4>[-<BA><BA>-]{+<B8><81>+}
    

    A workaround on these platforms is to provide raw bytes for UTF-8 chars (e.g. $'[^\x80-\xBF][\x80-\xBF]*' for '.') to git diff:

    $ echo '人' >test1.txt && echo '丁' >test2.txt
    $ git diff --no-index --word-diff=plain --word-diff-regex=$'[^\x80-\xBF][\x80-\xBF]*' -- test1.txt test2.txt
    diff --git a/test1.txt b/test2.txt
    index 3ef0891..3773917 100644
    --- a/test1.txt
    +++ b/test2.txt
    @@ -1 +1 @@
    [-人-]{+丁+}
    
    0 讨论(0)
提交回复
热议问题