问题
From what I understand, PostgreSQL delegates collation to the strcoll() function of underlying operating system, and apparently most (if not all) Linux installations take advantage of the fact that punctuation and spaces can be optionally collapsed when collating in UTF-8.
For example, I have a database in Postres 9.2 on CentOS 6.4 with
ENCODING='UTF8'
LC_COLLATE='en_US.UTF8'
LC_CTYPE='en_US.UTF8'
and you run the query
select * from (values('abc'),('ABC'),('Abc'),('...ABc'),('a BC')) x order by 1;
The results are
abc
a BC
Abc
...ABc
ABC
Mac OS X seems to honor punctuation and spaces but then uses a POSIX/C style sort. A similar database with the same settings on OS X returns
...ABc
ABC
Abc
a BC
abc
Regardless of the operating system, I would expect a proper collation to return (and the ICU Demo Project shows)
...ABc
a BC
abc
Abc
ABC
Is there any way to get Postgres installs on any operating system, but particularly Linux, to observe proper collation in the style of ICU?
回答1:
As you've noted, Postgres relies on the operating system to provide collation, and there's little Postgres can do about how things get collated beyond hooking directly into ICU.
Doing so has been a recurring discussion topic over the years, but is not a trivial task:
http://wiki.postgresql.org/wiki/Todo:ICU
来源:https://stackoverflow.com/questions/16342796/is-there-any-way-to-have-postgresql-not-collapse-punctuation-and-spaces-when-col