org.xml.sax.SAXParseException: The reference to entity “T” must end with the ';' delimiter

前端 未结 9 1719
北海茫月
北海茫月 2020-12-29 07:35

I am trying to parse an XML file whcih contains some special characters like \"&\" using DOM parser. I am getting the saxparse exception \"the reference to entity must e

9条回答
  •  渐次进展
    2020-12-29 07:42

    In complement of @PSpeed's answer, here is a complete solution (SAX parser):

        try {
    
            InputStream xmlStreamToParse = blob.getBinaryStream();
    
            // Clean
            BufferedReader br = new BufferedReader(new InputStreamReader(xmlStreamToParse));
    
            StringBuilder sb = new StringBuilder();
    
            String line;
            while ((line = br.readLine()) != null) {
                sb.append(line.replaceAll("&([^;]+(?!(?:\\w|;)))", "&$1")); // or whatever you want to clean
            }
    
            InputStream stream = org.apache.commons.io.IOUtils.toInputStream(sb.toString(), "UTF-8");
    
            // Parsing
            SAXParserFactory saxFactory = SAXParserFactory.newInstance();
            saxFactory.setNamespaceAware(true);
            SAXParser theParser = saxFactory.newSAXParser();
            XMLReader xmlReader = theParser.getXMLReader();
            LicenceXMLHandler licence = new LicenceXMLHandler();
            xmlReader.setContentHandler(licence);
            xmlReader.parse(new InputSource(stream));
    
        } catch (SQLException | SAXException | IOException | ParserConfigurationException e) {
            log.error("Error: " + e);
        }
    

    Explanations:

    • Transform the Blob into an InputStream
    • Clean the Blob
    • Parse the file (LicenceXMLHandler is the parser class)

提交回复
热议问题