Finding and removing non ascii characters from an Oracle Varchar2

前端 未结 17 2168
猫巷女王i
猫巷女王i 2020-12-02 23:03

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these reco

17条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-02 23:54

    Please note that whenever you use

    regexp_like(column, '[A-Z]')
    

    Oracle's regexp engine will match certain characters from the Latin-1 range as well: this applies to all characters that look similar to ASCII characters like Ä->A, Ö->O, Ü->U, etc., so that [A-Z] is not what you know from other environments like, say, Perl.

    Instead of fiddling with regular expressions try changing for the NVARCHAR2 datatype prior to character set upgrade.

    Another approach: instead of cutting away part of the fields' contents you might try the SOUNDEX function, provided your database contains European characters (i.e. Latin-1) characters only. Or you just write a function that translates characters from the Latin-1 range into similar looking ASCII characters, like

    • å => a
    • ä => a
    • ö => o

    of course only for text blocks exceeding 4000 bytes when transformed to UTF-8.

提交回复
热议问题