DomHandler to capture text for multiple records

霸气de小男生 提交于 2020-01-15 11:14:14

问题


I am attempting to use @XmlAnyElement with DomHandler to capture the unparsed text within a particular field like in this example from Blaise Doughan. But when I attempt to parse multiple customers the contents of bio fields from all previous records continue to be sent to my DomHandler!

Here is the example document I am trying to parse:

<?xml version="1.0" encoding="UTF-8"?>
<customers>
   <customer>
     <name>Jane Doe</name>
     <bio>
       <html>Jane's bio</html>
     </bio>
   </customer>
   <customer>
     <name>John Doe</name>
     <bio>
       <html>John's bio</html>
     </bio>
   </customer>
</customers>

But the output is:

 Name:  Jane Doe 
 Bio:   <html>Jane's bio</html>
 Name:  John Doe 
 Bio:   <html>Jane's bio</html>

BioHandler (unchanged from previous example)

package blog.domhandler;

import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.bind.ValidationEventHandler;
import javax.xml.bind.annotation.DomHandler;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class BioHandler implements DomHandler<String, StreamResult> {

    private static final String BIO_START_TAG = "<bio>";
    private static final String BIO_END_TAG = "</bio>";

    private StringWriter xmlWriter = new StringWriter();

    public StreamResult createUnmarshaller(ValidationEventHandler errorHandler) {
        return new StreamResult(xmlWriter);
    }

    public String getElement(StreamResult rt) {
        String xml = rt.getWriter().toString();
        int beginIndex = xml.indexOf(BIO_START_TAG) + BIO_START_TAG.length();
        int endIndex = xml.indexOf(BIO_END_TAG);
        return xml.substring(beginIndex, endIndex);
    }

    public Source marshal(String n, ValidationEventHandler errorHandler) {
        try {
            String xml = BIO_START_TAG + n.trim() + BIO_END_TAG;
            StringReader xmlReader = new StringReader(xml);
            return new StreamSource(xmlReader);
        } catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

}

Customer (unchanged from previous example)

package blog.domhandler;

import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement
@XmlType(propOrder={"name", "bio"})
public class Customer {

    private String name;
    private String bio;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @XmlAnyElement(BioHandler.class)
    public String getBio() {
        return bio;
    }

    public void setBio(String bio) {
        this.bio = bio;
    }

}

Customers

package blog.domhandler;

import java.util.List;

import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement
public class Customers {

    private List<Customer> customers;

    public List<Customer> getCustomer() {
        return customers;
    }

    public void setCustomer(List<Customer> c) {
        this.customers = c;
    }

}

Demo (driver)

package blog.domhandler;

import java.io.File;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Customers.class);

        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Customers customers = (Customers) unmarshaller.unmarshal(new File("src/blog/domhandler/input.xml"));

        for( Customer customer: customers.getCustomer() ) {

        System.out.println("Name:  " + customer.getName());
        System.out.println("Bio:   " + customer.getBio());

        }

    }
}

When I place a breakpoint in BioHandler.getElement(), I see that the first time its called String xml takes the value

<?xml version="1.0" encoding="UTF-8"?><bio><html>Jane's bio</html>
    </bio>

while the second time it is called String xml takes the value

<?xml version="1.0" encoding="UTF-8"?><bio><html>Jane's bio</html>
    </bio><?xml version="1.0" encoding="UTF-8"?><bio><html>John's bio</html>
    </bio>

Is there some way to indicate to the parser that this content should be discarded after each call to BioHandler.getElement()?


回答1:


Turns out my question was answered by the first comment on the blog post this example is taken from. The code of BioHandler.createUnmarshaller() should be:

public StreamResult createUnmarshaller(ValidationEventHandler errorHandler) {
    xmlWriter.getBuffer().setLength(0);
    return new StreamResult(xmlWriter);
}


来源:https://stackoverflow.com/questions/23550197/domhandler-to-capture-text-for-multiple-records

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!