Using JAXB to extract content of several XML elements as text

ⅰ亾dé卋堺 提交于 2020-08-25 08:27:40

问题


I have the following XML file

<items>
   <title><a href="blabla">blabla</a></title>
   <text><a href="123">123</a></text>
</items>

I'm unmarshalling the XML to the next java object by JAXB and XmlAnyElement annotation with two classes implementing DOMHandler. I want to extract the inner XML of elements "title" and "text" as Strings.

public class Item implements Serializable {
    private String title;
    private String text;

    public String getTitle() {
        return title;
    }
    @XmlAnyElement(value = TitleHandler.class)
    public void setTitle(String title) {
        this.title = title;
    }
    public String getText() {
        return text;
    }
    @XmlAnyElement(value = TextHandler.class)
    public void setText(String text) {
        this.text = text;
    }
}

But when i put a breakpoints in the method "String getElement(StreamResult rt)" of the TitleHandler and the TextHandler, both of elements use TextHandler.class for unmarshalling. Element "title" use TextHandler instead of TitleHandler. Any help will be greatly appriciated

UPDATE Restriction usage constraints for XmlAnyElement annotation: There can be only one XmlAnyElement annotated JavaBean property in a class and its super classes.


回答1:


The @XmlAnyElement annotation is used as a catch-all for elements in the XML input that aren't mapped by name to some specific property. That's why there can be only one such annotation per class (including inherited properties). What you want is this:

public class Item implements Serializable {
    private String title;
    private String text;

    public String getTitle() {
        return title;
    }
    @XmlElement(name = "title")
    @XmlJavaTypeAdapter(value = TitleHandler.class)
    public void setTitle(String title) {
        this.title = title;
    }
    public String getText() {
        return text;
    }
    @XmlElement(name = "text")
    @XmlJavaTypeAdapter(value = TextHandler.class)
    public void setText(String text) {
        this.text = text;
    }
}

The @XmlElement annotation indicates that the corresponding property is mapped to elements with that name. So the Java text property derives from the XML <text> element, and the title property from the <title> element. Since the names of the properties and the elements are the same, this is also the default behavior without the @XmlElement annotations, so you could leave them out.

In order to handle the conversion from XML content to a String instead of an actual structure (like a Title class or Text class) you'll need an adapter. that's what the @XmlJavaTypeAdapter annotation is for. It specifies how marshalling/unmarshalling for that property must be handled.

See this useful answer: https://stackoverflow.com/a/18341694/630136

An example of how you could implement TitleHandler.

import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class TitleHandler extends XmlAdapter<Object, String> {

    /**
     * Factory for building DOM documents.
     */
    private final DocumentBuilderFactory docBuilderFactory;
    /**
     * Factory for building transformers.
     */
    private final TransformerFactory transformerFactory;

    public TitleHandler() {
        docBuilderFactory = DocumentBuilderFactory.newInstance();
        transformerFactory = TransformerFactory.newInstance();
    }

    @Override
    public String unmarshal(Object v) throws Exception {
        // The provided Object is a DOM Element
        Element titleElement = (Element) v;
        // Getting the "a" child elements
        NodeList anchorElements = titleElement.getElementsByTagName("a");
        // If there's none or multiple, return empty string
        if (anchorElements.getLength() != 1) {
            return "";
        }
        Element anchor = (Element) anchorElements.item(0);
        // Creating a DOMSource as input for the transformer
        DOMSource source = new DOMSource(anchor);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // This is necessary to avoid the <?xml ...?> prolog
        transformer.setOutputProperty("omit-xml-declaration", "yes");
        // Transform to a StringWriter
        StringWriter stringWriter = new StringWriter();
        StreamResult result = new StreamResult(stringWriter);
        transformer.transform(source, result);
        // Returning result as string
        return stringWriter.toString();
    }

    @Override
    public Object marshal(String v) throws Exception {
        // DOM document builder
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        // Creating a new empty document
        Document doc = docBuilder.newDocument();
        // Creating the <title> element
        Element titleElement = doc.createElement("title");
        // Setting as the document root
        doc.appendChild(titleElement);
        // Creating a DOMResult as output for the transformer
        DOMResult result = new DOMResult(titleElement);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // String reader from the input and source
        StringReader stringReader = new StringReader(v);
        StreamSource source = new StreamSource(stringReader);
        // Transforming input string to the DOM
        transformer.transform(source, result);
        // Return DOM root element (<title>) for JAXB marshalling to XML
        return doc.getDocumentElement();
    }

}

If the type for unmarshalling input/marshalling output is left as Object, JAXB will provide DOM nodes. The above uses XSLT transformations (though without an actual stylesheet, just an "identity" transform) to turn the DOM input into a String and vice-versa. I've tested it on a minimal input document and it works for both XML to an Item object and the other way around.

EDIT:

The following version will handle any XML content in <title> rather than expecting a single <a> element. You'll probably want to turn this into an abstract class and then have TitleHander and TextHandler extend it, so that the currently hardcoded <title> tags are provided by the implementation.

import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

public class TitleHandler extends XmlAdapter<Object, String> {

    /**
     * Factory for building DOM documents.
     */
    private final DocumentBuilderFactory docBuilderFactory;
    /**
     * Factory for building transformers.
     */
    private final TransformerFactory transformerFactory;

    /**
     * XSLT that will strip the root element. Used to only take the content of an element given
     */
    private final static String UNMARSHAL_XSLT = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<xsl:transform xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">\n" +
"\n" +
"    <xsl:output method=\"xml\" omit-xml-declaration=\"yes\" />\n" +
"\n" +
"    <xsl:template match=\"/*\">\n" +
"      <xsl:apply-templates select=\"@*|node()\"/>\n" +
"    </xsl:template>\n" +
"\n" +
"    <xsl:template match=\"@*|node()\">\n" +
"        <xsl:copy>\n" +
"            <xsl:apply-templates select=\"@*|node()\"/>\n" +
"        </xsl:copy>\n" +
"    </xsl:template>\n" +
"    \n" +
"</xsl:transform>";

    public TitleHandler() {
        docBuilderFactory = DocumentBuilderFactory.newInstance();
        transformerFactory = TransformerFactory.newInstance();
    }

    @Override
    public String unmarshal(Object v) throws Exception {
        // The provided Object is a DOM Element
        Element rootElement = (Element) v;
        // Creating a DOMSource as input for the transformer
        DOMSource source = new DOMSource(rootElement);
        // Creating a transformer that will strip away the root element
        StreamSource xsltSource = new StreamSource(new StringReader(UNMARSHAL_XSLT));
        Transformer transformer = transformerFactory.newTransformer(xsltSource);
        // Transform to a StringWriter
        StringWriter stringWriter = new StringWriter();
        StreamResult result = new StreamResult(stringWriter);
        transformer.transform(source, result);
        // Returning result as string
        return stringWriter.toString();
    }

    @Override
    public Object marshal(String v) throws Exception {
        // DOM document builder
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        // Creating a new empty document
        Document doc = docBuilder.newDocument();
        // Creating a DOMResult as output for the transformer
        DOMResult result = new DOMResult(doc);
        // Default transformer: identity tranformer (doesn't alter input)
        Transformer transformer = transformerFactory.newTransformer();
        // String reader from the input and source
        StringReader stringReader = new StringReader("<title>" + v + "</title>");
        StreamSource source = new StreamSource(stringReader);
        // Transforming input string to the DOM
        transformer.transform(source, result);
        // Return DOM root element for JAXB marshalling to XML
        return doc.getDocumentElement();
    }

}



回答2:


As you already found out, @XmlAnyElement does not fit your need. Instead of using a DomHandler I would choose the straight-forward way and let JAXB do all the work for you.

Let class Item have two elements title and text:

public class Item {
    private Title title;
    private Text text;

    @XmlElement(name = "title")
    public Title getTitle() {
        return title;
    }
    public void setTitle(Title title) {
        this.title = title;
    }

    @XmlElement(name = "text")
    public Text getText() {
        return text;
    }
    public void setText(Text text) {
        this.text = text;
    }
}

Create classes Title and Text having only the element a:

public class Title {
    private A a;

    @XmlElement(name = "a")
    public A getA() {
        return a;
    }
    public void setA(A a) {
        this.a = a;
    }
}

public class Text {
    private A a;

    @XmlElement(name = "a")
    public A getA() {
        return a;
    }
    public void setA(A a) {
        this.a = a;
    }
}

Create class A having attribute href and a value for the inner text (by @XmlValue):

public class A {
    private String href;
    private String value;

    @XmlAttribute(name = "href")
    public String getHref() {
        return href;
    }
    public void setHref(String href) {
        this.href = href;
    }

    @XmlValue
    public String getValue() {
        return value;
    }
    public void setValue(String value) {
       this.value = value;
    }
}


来源:https://stackoverflow.com/questions/43776239/using-jaxb-to-extract-content-of-several-xml-elements-as-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!