Finding and removing non ascii characters from an Oracle Varchar2

前端 未结 17 2169
猫巷女王i
猫巷女王i 2020-12-02 23:03

We are currently migrating one of our oracle databases to UTF8 and we have found a few records that are near the 4000 byte varchar limit. When we try and migrate these reco

17条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-02 23:40

    Answer given by Francisco Hayoz is the best. Don't use pl/sql functions if sql can do it for you.

    Here is the simple test in Oracle 11.2.03

    select s
         , regexp_replace(s,'[^'||chr(1)||'-'||chr(127)||']','') "rep ^1-127"
         , dump(regexp_replace(s,'['||chr(127)||'-'||chr(225)||']','')) "rep 127-255"
    from (
    select listagg(c, '') within group (order by c) s
      from (select 127+level l,chr(127+level) c from dual connect by level < 129))
    

    And "rep 127-255" is

    Typ=1 Len=30: 226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255

    i.e for some reason this version of Oracle does not replace char(226) and above. Using '['||chr(127)||'-'||chr(225)||']' gives the desired result. If you need to replace other characters just add them to the regex above or use nested replace|regexp_replace if the replacement is different then '' (null string).

提交回复
热议问题