Unix sort treatment of underscore character

后端 未结 5 994
半阙折子戏
半阙折子戏 2020-12-30 01:16

I have two linux machines, on which unix sort seems to behave differently. I believe I\'ve narrowed it down to the treatment of the underscore character.

If I run

5条回答
  •  一生所求
    2020-12-30 01:31

    This is likely caused by a difference in locale. In the en_US.UTF-8 locale, underscores (_) sort after letters and numbers, whereas in the POSIX C locale they sort after uppercase letters and numbers, but before lowercase numbers.

    # won't change LC_COLLATE=C after execution
    $ LC_COLLATE=C sort filename
    

    You can also use sort --debug to show more information about the sorting behavior in general:

    $ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') |
          LC_COLLATE=en_US.UTF-8 sort --debug
    sort: using ‘en_US.UTF-8’ sorting rules
    foo0bar
    fooabar
    fooAbar
    foo_bar
    
    $ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') | 
          LC_COLLATE=C sort --debug
    sort: using simple byte comparison
    foo0bar
    fooAbar
    foo_bar
    fooabar
    

    As also shown in this answer, you can use the above formula to force LC_COLLATE=C for a single command, without modifying your shell environment:

提交回复
热议问题