org.xml.sax.SAXParseException: The reference to entity “T” must end with the ';' delimiter

前端 未结 9 1685
北海茫月
北海茫月 2020-12-29 07:35

I am trying to parse an XML file whcih contains some special characters like \"&\" using DOM parser. I am getting the saxparse exception \"the reference to entity must e

9条回答
  •  暖寄归人
    2020-12-29 08:01

    As others have stated, your XML is definitely invalid. However, if you can't change the generating application and can add a cleaning step then the following should clean up the XML:

    String clean = xml.replaceAll( "&([^;]+(?!(?:\\w|;)))", "&$1" );
    

    What that regex is doing is looking for any badly formed entity references and escaping the ampersand.

    Specifically, (?!(?:\\w|;)) is a negative look-ahead that makes that match stop at anything that is not a word character (a-z,0-9) and not a semi-colon. So the whole regex grabs everything from the & that is not a ; up until the first non-word, non-semi-colon character.

    It puts everything except the ampersand in the first capture group so that it can be referred to in the replace string. That's the $1.

    Note that this won't fix references that look like they are valid but aren't. For example, if you had &T; that would throw a different kind of error altogether unless the XML actually defines the entity.

提交回复
热议问题