Is there any way to have PostgreSQL not collapse punctuation and spaces when collating using a language?

浪子不回头ぞ 提交于 2019-12-10 10:28:16

问题


From what I understand, PostgreSQL delegates collation to the strcoll() function of underlying operating system, and apparently most (if not all) Linux installations take advantage of the fact that punctuation and spaces can be optionally collapsed when collating in UTF-8.

For example, I have a database in Postres 9.2 on CentOS 6.4 with

ENCODING='UTF8'
LC_COLLATE='en_US.UTF8'
LC_CTYPE='en_US.UTF8'

and you run the query

select * from (values('abc'),('ABC'),('Abc'),('...ABc'),('a BC')) x order by 1;

The results are

abc
a BC
Abc
...ABc
ABC

Mac OS X seems to honor punctuation and spaces but then uses a POSIX/C style sort. A similar database with the same settings on OS X returns

...ABc
ABC
Abc
a BC
abc

Regardless of the operating system, I would expect a proper collation to return (and the ICU Demo Project shows)

...ABc
a BC
abc
Abc
ABC

Is there any way to get Postgres installs on any operating system, but particularly Linux, to observe proper collation in the style of ICU?


回答1:


As you've noted, Postgres relies on the operating system to provide collation, and there's little Postgres can do about how things get collated beyond hooking directly into ICU.

Doing so has been a recurring discussion topic over the years, but is not a trivial task:

http://wiki.postgresql.org/wiki/Todo:ICU



来源:https://stackoverflow.com/questions/16342796/is-there-any-way-to-have-postgresql-not-collapse-punctuation-and-spaces-when-col

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!