I am trying to parse an XML file whcih contains some special characters like \"&\" using DOM parser. I am getting the saxparse exception \"the reference to entity must e
As others have stated, your XML is definitely invalid. However, if you can't change the generating application and can add a cleaning step then the following should clean up the XML:
String clean = xml.replaceAll( "&([^;]+(?!(?:\\w|;)))", "&$1" );
What that regex is doing is looking for any badly formed entity references and escaping the ampersand.
Specifically, (?!(?:\\w|;)) is a negative look-ahead that makes that match stop at anything that is not a word character (a-z,0-9) and not a semi-colon. So the whole regex grabs everything from the & that is not a ; up until the first non-word, non-semi-colon character.
It puts everything except the ampersand in the first capture group so that it can be referred to in the replace string. That's the $1.
Note that this won't fix references that look like they are valid but aren't. For example, if you had &T; that would throw a different kind of error altogether unless the XML actually defines the entity.