Postgresql and unicode table names: Why can I not select the table name from the information schema when it contains unicode characters?

问题

I created a table with a unicode character in the name (to specifically test table names with unicode). It created the table fine, but my method for detecting if the table exists broke!

Here is the interaction in question:

caribou_test=# select table_name from information_schema.tables where table_schema = 'public';
 table_name  
-------------
...
 pinkpink1
(16 rows)

caribou_test=# select table_name from information_schema.tables where table_schema = 'public' and table_name = 'pinkƒpink1';
 table_name 
------------
(0 rows)

caribou_test=# select table_name from information_schema.tables where table_schema = 'public' and table_name = 'pinkpink1';
 table_name 
------------
(0 rows)

caribou_test=# select * from pinkƒpink1;
 id | position | env_id | locked |         created_at         |       updated_at        | status_id | status_position | i1l0  |  f∆   |  growth555   
----+----------+--------+--------+----------------------------+-------------------------+-----------+-----------------+-------+-------+--------------
  1 |        0 |      1 | f      | 2013-06-27 14:50:34.228136 | 2013-06-27 14:50:34.227 |         1 |               0 | YELLL | 55555 | 1.3333388822
(1 row)

The table name is pinkƒpink1 (test data). As you can see, when I select the table names from information_schema.tables it displays without the ƒ, but I can't select the table name either way! But I can still issue selects to that table directly. What is going on here?

EDIT: providing requested information for @craig-ringer:

caribou_test=# SELECT current_setting('server_encoding') AS server_encoding, current_setting('client_encoding') AS client_encoding, version();
 server_encoding | client_encoding |                                                                    version                                                                     
-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------
 UTF8            | UTF8            | PostgreSQL 9.2.2 on x86_64-apple-darwin12.2.1, compiled by Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn), 64-bit

caribou_test=# SELECT * FROM pg_class WHERE relname = 'pinkƑpink1';
--->  (0 rows)

caribou_test=# SELECT upper('ƒ') = 'Ƒ', lower('Ƒ') = 'ƒ';
 ?column? | ?column? 
----------+----------
 t        | t
(1 row)

caribou_test=# WITH chars(rowid, thechar) AS (VALUES (1,'ƒ'),(2,'Ƒ'),(3,upper('ƒ')),(4,lower('Ƒ'))) SELECT rowid, thechar, convert_to(thechar, 'utf-8') from chars;
 rowid | thechar | convert_to 
-------+---------+------------
     1 | ƒ       | \xc692
     2 | Ƒ       | \xc691
     3 | Ƒ       | \xc691
     4 | ƒ       | \xc692

回答1:

It looks like a bug, perhaps in regclass or something related to it:

# create table pinkƒpink1 (id serial);
NOTICE:  CREATE TABLE will create implicit sequence "pink?pink1_id_seq" for serial column "pink?pink1.id"
CREATE TABLE
# select 'pinkƒpink1'::name;
    name    
------------
 pinkƒpink1
(1 row)

# select 'pinkƒpink1'::regclass;
  regclass   
-------------
 "pinkpink1"
(1 row)

# select relname from pg_class where oid = 'pinkƒpink1'::regclass;
  relname  
-----------
 pinkpink1

# select relname from pg_class where relname = 'pinkƒpink1'::name;
 relname 
---------
(0 rows)

# select relname from pg_class where relname = 'pinkpink1';
 relname 
---------
(0 rows)

(My system is OSX Lion with everything utf8, in case it matters.)

For the workaround, you can cast it to ::regclass as is done above (the one that found the table). Note that casting to ::regclass will yield an error if the table doesn't exist, though, so code around that accordingly.

Per Craig's request:

# SELECT current_setting('server_encoding') AS server_encoding, current_setting('client_encoding') AS client_encoding, version();
 server_encoding | client_encoding |                                                              version                                                              
-----------------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------
 UTF8            | UTF8            | PostgreSQL 9.2.4 on x86_64-apple-darwin11.4.2, compiled by Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), 64-bit
(1 row)

And per Erwin's:

# SELECT name, setting FROM pg_settings WHERE  name IN ('lc_collate','lc_ctype','client_encoding','server_encoding');
      name       |   setting   
-----------------+-------------
 client_encoding | UTF8
 lc_collate      | en_US.UTF-8
 lc_ctype        | en_US.UTF-8
 server_encoding | UTF8
(4 rows)

回答2:

I tested your case locally with Postgres 9.1.9 and it just works.

Same in this SQLfiddle with Postgres 9.2.4. It just works.

It must be something that is not in your question ...

OSX?

Seems to be reproducible on OSX.

To help debug this you should provide more information.

server-encoding, client encoding, locale settings:

SELECT name, setting
FROM   pg_settings
WHERE  name IN ('lc_collate','lc_ctype','client_encoding','server_encoding')

Which client? How do you connect?

ƒ is a lower case Ƒ. Postgres depends on the underlying OS for locale settings. When you query the information schema or the catalog tables, you need to supply an exact string (case sensitive!). But when you use the identifier without double-quoting in an SQL statement it is cast to lower case first. If your locale for some reason thinks it has to convert ƒ to some lower case equivalent, this would explain everything we have seen.

To rule this out (or verify), try your test with and without double-quoting:

CREATE TEMP TABLE "pinkƒpink1" (id int);
CREATE TEMP TABLE pinkƒpink1 (id int);

In my test under Debian Linux both result in the same table name, so I cannot execute the second command. I suspect, it is different in your case, which would explain the whole matter.

来源：https://stackoverflow.com/questions/17353469/postgresql-and-unicode-table-names-why-can-i-not-select-the-table-name-from-the

标签

macos

postgresql

unicode