PostgreSQL incorrect sorting

后端 未结 2 1026
甜味超标
甜味超标 2020-12-09 18:25

I use PostgreSQL 9.3.3 and I have a table with one column named as title (character varying(50)).

When I have executed the following query:

select *         


        
相关标签:
2条回答
  • 2020-12-09 19:10

    It seems, that when sorting Oracle as well as Postgres just ignore non alpha numeric chars, e.g.

      select '*' 
       union all
      select '#' 
       union all
      select 'A'
       union all
      select '*E'
       union all
      select '*B'
       union all
      select '#C'
       union all
      select '#D'
    order by 1 asc
    

    returns (look: that DBMS doesn't pay any attention on prefix before 'A'..'E')

      *
      #
      A
      *B
      #C
      #D
      *E
    

    In your case, what Postgres actually sorts is

    '', 'A' and 'Example'

    If you put '#' in the middle od the string, the behaviour will be the same:

      select 'A#B'
       union all
      select 'AC'
       union all
      select 'A#D'  
       union all
      select 'AE' 
    order by 1 asc
    

    returns (# ignored, and so 'AB', 'AC', 'AD' and 'AE' actually compared)

      A#B
      AC
      A#D
      AE
    

    To change the comparison rules you should use collation, e.g.

      select '#' collate "POSIX"
       union all
      select 'A' collate "POSIX"
       union all
      select '#Example' collate "POSIX"
    order by 1 asc
    

    returns (as it required in your case)

      #
      #Example
      A
    
    0 讨论(0)
  • 2020-12-09 19:31

    Sort behaviour for text (including char and varchar as well as the text type) depends on the current collation of your locale.

    See previous closely related questions:

    • PostgreSQL Sort
    • https://stackoverflow.com/q/21006868/398670

    If you want to do a simplistic sort by ASCII value, rather than a properly localized sort following your local language rules, you can use the COLLATE clause

    select * 
    from test
    order by title COLLATE "C" ASC
    

    or change the database collation globally (requires dump and reload, or full reindex). On my Fedora 19 Linux system, I get the following results:

    regress=> SHOW lc_collate;
     lc_collate  
    -------------
     en_US.UTF-8
    (1 row)
    
    regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
              SELECT title FROM v ORDER BY title ASC;
     title 
    -------
     #
     a
     #a
     a#
     a#a
    (5 rows)
    
    regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
              SELECT title FROM v ORDER BY title COLLATE "C" ASC;
     title 
    -------
     #
     #a
     a
     a#
     a#a
    (5 rows)
    

    PostgreSQL uses your operating system's collation support, so it's possible for results to vary slightly from host OS to host OS. In particular, at least some versions of Mac OS X have significantly broken unicode collation handling.

    0 讨论(0)
提交回复
热议问题