Java XML Parsing and original byte offsets

主宰稳场 提交于 2019-12-05 18:24:07

问题


I'd like to parse some well-formed XML into a DOM, but I'd like know the offset of each node's tag in the original media.

For example, if I had an XML document with the content something like:

<html>
<body>
<div>text</div>
</body>
</html>

I'd like to know that the node starts at offset 13 in the original media, and (more importantly) that "text" starts at offset 18.

Is this possible with standard Java XML parsers? JAXB? If no solution is easily available, what type of changes are necessary along the parsing path to make this possible?


回答1:


The SAX API provides a rather obscure mechanism for this - the org.xml.sax.Locator interface. When you use the SAX API, you subclass DefaultHandler and pass that to the SAX parse methods, and the SAX parser implementation is supposed to inject a Locator into your DefaultHandler via setDocumentLocator(). As the parsing proceeds, the various callback methods on your ContentHandler are invoked (e.g. startElement()), at which point you can consult the Locator to find out the parsing position (via getColumnNumber() and getLineNumber())

Technically, this is optional functionality, but the javadoc says that implementations are "strongly encouraged" to provide it, so you can likely assume the SAX parser built into JavaSE will do it.

Of course, this does mean using the SAX API, which is noone's idea of fun, but I can't see a way of accessing this information using a higher-level API.

edit: Found this example.




回答2:


Use the XML Streamreader and its getLocation() method to return location object. location.getCharacterOffset() gives the byte offset of current location.

import javax.xml.stream.Location;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

public class Runner {

public static void main(String argv[]) {

    XMLInputFactory factory = XMLInputFactory.newInstance();
    try{
    XMLStreamReader streamReader = factory.createXMLStreamReader(
           new FileReader("D:\\BigFile.xml"));

    while(streamReader.hasNext()){
        streamReader.next();
        if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT){
            Location location = streamReader.getLocation();
            System.out.println("byte location: " + location.getCharacterOffset());
            }
        }
    } catch(Exception e){
        e.printStackTrace();
    }


来源:https://stackoverflow.com/questions/3507350/java-xml-parsing-and-original-byte-offsets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!