I am trying to parse an XML file whcih contains some special characters like \"&\" using DOM parser. I am getting the saxparse exception \"the reference to entity must e
In complement of @PSpeed's answer, here is a complete solution (SAX parser):
try {
InputStream xmlStreamToParse = blob.getBinaryStream();
// Clean
BufferedReader br = new BufferedReader(new InputStreamReader(xmlStreamToParse));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line.replaceAll("&([^;]+(?!(?:\\w|;)))", "&$1")); // or whatever you want to clean
}
InputStream stream = org.apache.commons.io.IOUtils.toInputStream(sb.toString(), "UTF-8");
// Parsing
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
saxFactory.setNamespaceAware(true);
SAXParser theParser = saxFactory.newSAXParser();
XMLReader xmlReader = theParser.getXMLReader();
LicenceXMLHandler licence = new LicenceXMLHandler();
xmlReader.setContentHandler(licence);
xmlReader.parse(new InputSource(stream));
} catch (SQLException | SAXException | IOException | ParserConfigurationException e) {
log.error("Error: " + e);
}
Explanations: