regexp for all accented characters in Oracle

匆匆过客 提交于 2020-01-04 15:15:17

问题


I am trying to find data that has accented characters. I've tried this:

select *
from xml_tmp
where regexp_like (XMLTYpe.getClobVal(xml_tmp.xml_data), unistr('\0090'))

And it works. It finds all records where the XML data field contains É. The problem is that it only matches the upper-case E with an accent. I tried to write a more generic query to find ALL data with accented vowels (a, e, i, o, u, upper and lowercase, with any accents) using equivalence classes. I wanted a regex to match only accented vowels, but I'm not sure how to get it, as equivalence classes such as [[=e=]] match all e's (with or without accents).

Also, this does not actually work:

select *
from xml_tmp
where regexp_like (XMLTYpe.getClobVal(xml_data),'É');

(using Oracle 10g)


回答1:


How about

SELECT *
  FROM xml_tmp
 WHERE REGEXP_LIKE
        ( REGEXP_REPLACE
          ( XMLTYpe.getClobVal(xml_tmp.xml_data),
            '[aeiouAEIOU]',
            '-'
          )
          '[[=a=][=e=][=i=][=o=][=u=]]'
        )
;

? That will eliminate any unaccented vowels before performing the REGEXP_LIKE.

(It's ugly, I know. But it should work.)




回答2:


After some more experimenting, I have found that this seems to work ok:

select *
from xml_tmp
where regexp_like(XMLTYpe.getClobVal(xml_data),'[^[:graph:][:space:]]') 

I had thought that [:graph:] would include all upper and lower case characters, with or without accents, but it seems that it only matches unaccented characters.


Further experimentation shows that this might not work in all cases. Try these queries:

select *
from dual
where regexp_like (unistr('\0090'),'[^[:graph:][:space:]]');
DUMMY
-------
X
(the match succeeded)

So it looks like the character that's been causing me trouble matches this pattern.

select *
from dual
where regexp_like ('É','[^[:graph:][:space:]]');
DUMMY
-------

(the match failed)

When I try to run this query with the accented E as copied-and-pasted, the match fails! I guess whatever I copied-and-pasted is actually different. Ugh, I think I now hate working with changing character encodings.



来源:https://stackoverflow.com/questions/9020610/regexp-for-all-accented-characters-in-oracle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!