I want to use REGEXP_INSTR() within an oracle database to check for lower/uppercase characters. I\'m aware of [:upper:] and [:lower:] POSI
Okay, the answer that NLS_SORT causes this behavior is correct, but I don't think it explains it in an understandable way. None of the documentation I found actually does that...
You have to imagine that the character ranges defined by [a-z] are actually derived from a single substring of all possible characters which are sorted depending on NLS_SORT.
Lets assume the whole alphabet is just alphanumerical characters. Sorted by BINARY this results in a base string like 0123456789abcdefgh...xyzABCDE...XYZ.
Derived from this, [0-6] expands to [0123456], [a-f] to [abcdef], [5-b] to [56789ab] etc.
Sorted by a linguistic_definition however results in a different base string, like 0123456789aAbBcCdDeF...xXyYzZ.
Derived from this, [0-6] still expands to [0123456], but [a-f] now expands to [aAbBcCdDeEf] and [5-b] to [56789aAb] etc...
This is why a did not match [A-Z], but b did. [A-Z] actually expands to [AbBcC...yYzZ] which includes z but not a.
In reality [A-Z] might even contain more characters, like [aAàáâÀÁÂ...] etc.
The reason for the behavior is the collation rules. See the NLS_SORT documentation:
- If the value is BINARY, then the collating sequence for ORDER BY queries is based on the numeric value of characters (a binary sort that requires less system overhead).
- If the value is a named linguistic sort, sorting is based on the order of the defined linguistic sort. Most (but not all) languages supported by the NLS_LANGUAGE parameter also support a linguistic sort with the same name.
Set the NLS_SORT to BINARY so that the [A-Z] could be parsed in the same order as in the ASCII table,
alter session set nls_sort = 'BINARY'
Then, you will get consistent results.
See the online demo.