ORDER BY with diacritic in Postgres

匆匆过客 提交于 2019-12-11 10:02:49

问题


I need to select data from table and sort them with ORDER BY clause. The problem is the column contains text data with czech diacritic. I cannot use COLLATE, because the DB is part of postgres cluster which was created with lc_collate = en_US.UTF-8 and I cannot afford downtime caused by recreating the cluster with correct lc_collate.

Sample data:

CREATE TABLE test (
  id serial PRIMARY key,
  name text
);

INSERT INTO test (name) VALUES ('Žoo'), ('Zoo'), ('ŽOO'), ('ZOO'),
  ('ŽoA'), ('ŽóA'), ('ŽoÁ'), ('ŽóÁ');

Ideal output:

SELECT * FROM test ORDER BY name COLLATE "cs_CZ.utf8";
id | name
----+------
  2 | Zoo
  4 | ZOO
  5 | ŽoA
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
  1 | Žoo
  3 | ŽOO
(8 rows)

Here I found kind of solution:

SELECT * FROM test ORDER BY name USING ~<~;
id | name
----+------
  4 | ZOO
  2 | Zoo
  3 | ŽOO
  5 | ŽoA
  1 | Žoo
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
(8 rows)

The result is close enough (for my usage) - the caroned letters are AFTER the non-caroned.


My slightly off-topic Postgresql anabasis with ~<~ operator

edit: turned into new question.


Back to the question: Is there other solution to get the ideal order besides recreating the postgres cluster with correct locale?

Also some insght on the ~<~ operator would be nice.


回答1:


Im not sure if I understand the question, because look like you already found a solution. The only think I can suggest is you can add a new field czechName with the correct collate

http://www.postgresql.org/docs/current/static/sql-altertable.html

ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]



回答2:


Similar to what Juan Carlos Oropeza suggests, you can try to change the collation of the column with:

ALTER TABLE test ALTER COLUMN "name" TYPE text COLLATE 'cs_CZ.utf8';

Reference: http://www.postgresql.org/docs/current/static/sql-altertable.html




回答3:


As @Igor pointed out in his comment, there is no need to recreate the postgres cluster with different lc_collate and deal with the caused downtime.

Exact steps that solved the problem were:

  1. add/uncomment line cs_CZ.UTF-8 UTF-8 in /etc/locale.gen

  2. generate the new locale:

    # locale-gen

  3. define new collation in postgres:

    CREATE COLLATION "cs_CZ.utf8" ( locale = 'cs_CZ.UTF-8' );



来源:https://stackoverflow.com/questions/35629454/order-by-with-diacritic-in-postgres

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!