How can i parse an XML file with HTML tags inside CDATA section?

后端 未结 1 608
梦如初夏
梦如初夏 2020-12-22 00:27


    

        
相关标签:
1条回答
  • 2020-12-22 01:19

    Just access the CDATA as text content

    Variant 1 (DOM):

        import java.io.BufferedInputStream;
        import java.io.FileInputStream;
        import java.io.InputStream;
        import javax.xml.parsers.DocumentBuilder;
        import javax.xml.parsers.DocumentBuilderFactory;
        import org.w3c.dom.Document;
        import org.w3c.dom.Node;
        import org.w3c.dom.NodeList;
    
    public void getCDATAFromHardcodedPathWithDom() {
        String yourSampleFile = "/path/toYour/sample/file.xml";
        String cdataNode = "extendedinfo";
        try (InputStream in =
                new BufferedInputStream(new FileInputStream(yourSampleFile))) {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(in);
            NodeList elements = doc.getElementsByTagName(cdataNode);
            for (int i = 0; i < elements.getLength(); i++) {
                Node e = elements.item(i);
                System.out.println(e.getTextContent());
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    

    Variant 2 (stax):

    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    
    import javax.xml.stream.XMLInputFactory;
    import javax.xml.stream.XMLStreamConstants;
    import javax.xml.stream.XMLStreamReader;
    
    public void getCDATAFromHardcodedPathWithStax() {
        String yourSampleFile = "/path/toYour/sample/file.xml";
        String cdataNode = "extendedinfo";
        XMLStreamReader r = null;
        try (InputStream in =
                new BufferedInputStream(new FileInputStream(yourSampleFile));)        {
            XMLInputFactory factory = XMLInputFactory.newInstance();
            r = factory.createXMLStreamReader(in);
            while (r.hasNext()) {
                switch (r.getEventType()) {
                case XMLStreamConstants.START_ELEMENT:
                    if (cdataNode.equals(r.getName().getLocalPart())) {
                        System.out.println(r.getElementText());
                    }
                    break;
                default:
                    break;
                }
                r.next();
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            if (r != null) {
                try {
                    r.close();
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
        }
    }
    

    With /path/toYour/sample/file.xml

    <?xml version="1.0" encoding="utf-8" standalone="yes" ?>
    <root>
    <extendedinfo type="html">
        <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
    </extendedinfo>
    <extendedinfo type="html">
        <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
    </extendedinfo>
    </root>
    

    It will give you

    <table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>
    
    
    <tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>
    

    An interesting alternative using JAXB is given here:

    Retrieve value from CDATA

    An example on how to extract just all CDATA is given here:

    Unable to check CDATA in XML using XMLEventReader in Stax

    0 讨论(0)
提交回复
热议问题