Set Order By to ignore punctuation on a per-column basis

梦想与她 提交于 2019-12-14 04:16:40

问题


Is it possible to order the results of a PostgreSQL query by a title field that contains characters like [](),; etc but do so ignoring these punctuation characters and sorting only by the text characters?

I've read articles on changing the database collation or locale but have not found any clear instructions on how to do this on an existing database an on a per-column basis. Is this even possible?


回答1:


If you want to have this ordering in one particular query you can

ORDER BY regexp_replace(title, '[^a-zA-Z]', '', 'g')

It will delete all non A-Z from sting and order by resulting field.




回答2:


"Normalize" for sorting

You could use regexp_replace() with the pattern '[^a-zA-Z]' in the ORDER BY clause but that only recognizes pure ASCII letters. Better use the class shorthand '\W' which recognizes additional non-ASCII letters in your locale like äüóèß etc. Or you could improvise and "normalize all characters with diacritic elements to their base form with the help of the unaccent() function. Consider this little demo:

SELECT *
      , regexp_replace(x, '[^a-zA-Z]', '', 'g')
      , regexp_replace(x, '\W', '', 'g')
      , regexp_replace(unaccent(x), '\W', '', 'g')
FROM  (
SELECT 'XY ÖÜÄöüäĆČćč€ĞğīїıŁłŃńŇňŐőōŘřŠšŞşůŽžż‘´’„“”­–—[](),;.:̈� XY'::text AS x) t

->SQLfiddle for Postgres 9.2.
->SQLfiddle for Postgres 9.1.

Regular expression code has been updated in version 9.2. I am assuming this is the reason for the improved handling in 9.2 where all letter characters in the example are matched, while 9.1 only matches some.

unaccent() is provided by the additional module unaccent. Run:

CREATE EXTENSION unaccent;

once per database to use in (Postgres 9.1+, older versions use a different technique).

locales / collation

You must be aware that Postgres relies on the underlying operating system for locales (including collation). The sort order is governed by your chosen locale, or more specific LC_COLLATE. More in this related answer:
String sort order (LC_COLLATE and LC_CTYPE)

There are plans to incorporate collation support into Postgres directly, but that's not available at this time.

Many locales ignore the special characters you describe for sorting character data out of the box. If you have a locale installed in your system that provides the sort order you are looking for, you can use it ad-hoc in Postgres 9.1 or later:

SELECT foo FROM bar ORDER BY foo COLLATE "xy_XY"

To see which collations are installed and available in your current Postgres installation:

SELECT * FROM pg_collation;

Unfortunately it is not possible to define your own custom collation (yet) unless you hack the source code.

The collation rules are usually governed by the rules of a language as spoken in a country. The sort order telephone books would be in, if there were still telephone books ... Your operating system provides them.

For instance, in Debian Linux you can use:

locale -a

to display all generated locales. And:

dpkg-reconfigure locales

as root user (one way of several) to generate / install more.



来源:https://stackoverflow.com/questions/17410742/set-order-by-to-ignore-punctuation-on-a-per-column-basis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!