I have two linux machines, on which unix sort seems to behave differently. I believe I\'ve narrowed it down to the treatment of the underscore character.
If I run
This is likely caused by a difference in locale. In the en_US.UTF-8
locale, underscores (_
) sort after letters and numbers, whereas in the POSIX C locale they sort after uppercase letters and numbers, but before lowercase numbers.
# won't change LC_COLLATE=C after execution
$ LC_COLLATE=C sort filename
You can also use sort --debug
to show more information about the sorting behavior in general:
$ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') |
LC_COLLATE=en_US.UTF-8 sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
foo0bar
fooabar
fooAbar
foo_bar
$ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar') |
LC_COLLATE=C sort --debug
sort: using simple byte comparison
foo0bar
fooAbar
foo_bar
fooabar
As also shown in this answer, you can use the above formula to force LC_COLLATE=C
for a single command, without modifying your shell environment: