Unix sort treatment of underscore character

后端未结

关注

 5  1015

半阙折子戏 2020-12-30 01:16

I have two linux machines, on which unix sort seems to behave differently. I believe I\'ve narrowed it down to the treatment of the underscore character.

If I run

5条回答

渐次进展 (楼主)

2020-12-30 01:25
I really liked the answer above with the useful example, i'd just add another string to its list to show how strange the sorting behavior can be:
```
$ (echo 'foo_bar'; echo 'fooAbar'; echo 'foo0bar'; echo 'fooabar'; echo 'foobbar'; echo 'foobar') | LC_COLLATE=en_US.UTF-8 sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
foo0bar
_______
fooabar
_______
fooAbar
_______
foobar
______
foo_bar
_______
foobbar
_______
```
Seems crazy right ? Found the explanation here, in this case it's because the unicode collation algorithm is being used in this locale : https://unix.stackexchange.com/questions/252419/unexpected-sort-order-in-en-us-utf-8-locale

HOWEVER, even the 'sort --debug' option is not able to easily demonstrate the subtleties that go into the strcoll() function's rules for obeying the locale sorting specification.

POSIX stipulates that locale authors (for all but the C locale) have absolute control over all sorts of fiddly aspects of how strcoll() behaves, and the fact that two vendors declare that their locale is named en_US.UTF-8 does NOT imply/require those two vendors to have the same locale definition. So the collation rules between two different platforms are very likely different, based on whoever wrote the locale file for that platform, and what bug fixes have been incorporated into the locale definition over time.

Thank you Eric Blake at Red Hat for this insight.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...