It looks like postgres upper/lower function does not handle select characters in Turkish character set.
select upper(\'Aaı\'), lower(\'Aaİ\') from
This is indeed bug in PostgreSQL (still not fixed, even in current git tree). Proof: https://github.com/postgres/postgres/blob/master/src/port/pgstrcasecmp.c
PostgreSQL developers even mention specifically those Turkish characters there:
SQL99 specifies Unicode-aware case normalization, which we don't yet have the infrastructure for. Instead we use tolower() to provide a locale-aware translation. However, there are some locales where this is not right either (eg, Turkish may do strange things with 'i' and 'I'). Our current compromise is to use tolower() for characters with the high bit set, and use an ASCII-only downcasing for 7-bit characters.
pg_upper() implemented in this file is extremely simplistic (as its companion pg_tolower()):
unsigned char
pg_toupper(unsigned char ch)
{
if (ch >= 'a' && ch <= 'z')
ch += 'A' - 'a';
else if (IS_HIGHBIT_SET(ch) && islower(ch))
ch = toupper(ch);
return ch;
}
As you can see, this code does not treat its parameter as Unicode code point, and cannot possibly work 100% correctly, unless currently selected locale happens to be the one that we care for (like Turkish non-unicode locale) and OS-provided non-unicode toupper() is working correctly.
This is really sad, I just hope that this will be solved in upcoming PostgreSQL releases...
It seems to me that your problem is related to Windows. This is how it looks on Ubuntu (Postgres 8.4.14), database encoding UTF-8:
test=# select upper('Aaı'), lower('Aaİ');
upper | lower
-------+-------
AAI | aai
(1 row)
My recommendation would be - if you have to use Windows - to write a stored procedure that will do the conversion for you. Use built-in replace: replace('abcdefabcdef', 'cd', 'XX') returns abXXefabXXef. There might be a more optimal solution, I do not claim that this approach is the correct one.
Your problem is 100% Windows. (Or rather Microsoft Visual Studio, which PostgreSQL was built with, to be more precise.)
For the record, SQL UPPER ends up calling Windows' LCMapStringW (via towupper via str_toupper) with almost all the right parameters (locale 1055 Turkish for a UTF-8-encoded, Turkish_Turkey database),
but
the Visual Studio Runtime (towupper) does not set the LCMAP_LINGUISTIC_CASING bit in LCMapStringW's dwMapFlags. (I can confirm that setting it does the trick.) This is not considered a bug at Microsoft; it is by design, and will probably not ever be "fixed" (oh the joys of legacy.)
You have three ways out of this:
MSVCR100.DLL in your PostgreSQL bin directory (but although UPPER and LOWER would work, other things such as collation may continue to fail -- again, at the Windows level. YMMV.)For completeness (and nostalgic fun) ONLY, here is the procedure to patch a Windows system (but remember, unless you'll be managing this PostgreSQL instance from cradle to grave you may cause a lot of grief to your successor(s); whenever deploying a new test or backup system from scratch you or your successor(s) would have to remember to apply the patch again -- and if let's say you one day upgrade to PostgreSQL 10, which say uses MSVCR120.DLL instead of MSVCR100.DLL, then you'll have to try your luck with patching the new DLL, too.) On a test system
C:\WINDOWS\SYSTEM32\MSVCR100.DLLbin directory (do not attempt to copy the file using Explorer or the command line, they might copy the 64bit version)4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 004E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 01FC 51 6A 01 8D 4D 08 51 68 00 02 00 00 50 E8 E2FC 51 6A 01 8D 4D 08 51 68 00 02 00 01 50 E8 E2bin directory, then restart PostgreSQL and re-run your query.
Turkish_Turkey for both LC_CTYPE and LC_COLLATE) open postgres.exe in 32-bit Dependency Walker and make sure it indicates it loads MSVCR100.DLL from the PostgreSQL bin directory.bin directory and restart.BUT REMEMBER, the moment you move the data off the Ubuntu system or off the patched Windows system to an unpatched Windows system you will have the problem again, and you may be unable to import this data back on Ubuntu if the Windows instance introduced duplicates in a citext field or in a UPPER/LOWER-based function index.