Finding and removing non ascii characters from an Oracle Varchar2

前端未结

关注

 17  2222

猫巷女王i 2020-12-02 23:03

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these reco

17条回答

轻奢々 (楼主)

2020-12-02 23:54
Please note that whenever you use
```
regexp_like(column, '[A-Z]')
```
Oracle's regexp engine will match certain characters from the Latin-1 range as well: this applies to all characters that look similar to ASCII characters like Ä->A, Ö->O, Ü->U, etc., so that [A-Z] is not what you know from other environments like, say, Perl.

Instead of fiddling with regular expressions try changing for the NVARCHAR2 datatype prior to character set upgrade.

Another approach: instead of cutting away part of the fields' contents you might try the SOUNDEX function, provided your database contains European characters (i.e. Latin-1) characters only. Or you just write a function that translates characters from the Latin-1 range into similar looking ASCII characters, like
- å => a
- ä => a
- ö => o
of course only for text blocks exceeding 4000 bytes when transformed to UTF-8.
0 讨论(0)

查看其它17个回答
发布评论:

提交评论
- 加载中...