问题
I'm producing an XML document using SQL on Oracle 11g database. But I'm having a problem with a database field, because the title field holds many characters some of which XML see's as invalid, I'm trying to use the below statement to catch as many as possible and convert them to NULL.
REGEXP_REPLACE (title, '’|£|&|*|@|-|>|/|<|;|\', '', 1, 0, 'i') as title
I'm still getting the parse problem so I know there must be more invalid characters I've missed. I know it's failing on this field as when I change the field to a string 'Title' (as below), the document is parsed and it works fine.
REGEXP_REPLACE ('title', '’|£|&|*|@|-|>|/|<|;|\', '', 1, 0, 'i') as title
I'm using XML version '1.0" encoding="UTF-8', is there an easy way around this or do I have to locate the records that are failing which could be any from 2 million records. The title field holds song titles from all over the world, could I use REGEXP_REPLACE to get a range of characters between char(32) and lets say char(255) anything not in this range replace with NULL.
OR is there another solution.
thanks in advance guys
回答1:
Have you considered only keeping the characters you want? I don't know what they are, but something like this
REGEXP_REPLACE('title', '[^a-zA-Z0-9 ,.!]', '', 1, 0, 'i') as title
回答2:
The only illegal characters in XML are &
, <
and >
(as well as "
or '
in attributes).
You can escape such characters with an Oracle function
Example:
select DBMS_XMLGEN.CONVERT(title) from ...
Details: https://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_xmlgen.htm#i1013100
来源:https://stackoverflow.com/questions/40493076/ora-31011-xml-parsing-failed-invalid-characters-oracle-sql