SAX: How to get the content of an element

最后都变了- 提交于 2020-01-22 20:19:28

问题


I have some trouble understanding parsing XML structures with SAX. Let's say there is the following XML:

<root>
  <element1>Value1</element1>
  <element2>Value2</element2>
</root>

and a String variable myString.

Just going through with the methods startElement, endElement() and characters() is easy. But I don't understand how I can achieve the following:

If the current element equals element1 store its value value1 in myString. As far as I understand there is nothing like:

if (qName.equals("element1")) myString = qName.getValue();

Guess I'm just thinking too complicated :-)

Robert


回答1:


With SAX you need to maintain your own stack. You can do something like this for very basic processing:

void startElement(...) {
    if (name.equals("element1")) {
        inElement1 = true;
        element1Content = new StringBuffer();
    }
}

void characters(...) {
    if (inElement1) {
        element1Content.append(characterData);
    }
}

void endElement(...) {
    if (name.equals("element2")) {
        inElement1 = false;
        processElement1Content(element1Content.toString());
    }
}

If you want code as in your example then you need to use the DOM model rather than SAX. DOM is easier to code up but is generally slower and more memory expensive than SAX.

I recommend using a third-party library rather than the built-in Java XML libraries for DOM manipulation. Dom4J seems pretty good but there are probably other libraries out there too.




回答2:


This solution works for a single element with text content. When element1 has more sub-elements some more work is needed. Brian's remark is a very important one. When you have multiple elements or want a more generic solution this might help you. I tested it with a 300+MB xml file and it's still very fast:

final StringBuilder builder=new StringBuilder();
XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();

DefaultHandler handler = new DefaultHandler() {
    boolean isParsing = false;

    public void startElement(String uri, String localName, String qName, Attributes attributes) {
        if ("element1".equals(localName)) {
            isParsing = true;
        }
        if (isParsing) {
            builder.append("<" + qName + ">");
        }
    }

    @Override
    public void characters(char[] chars, int i, int i1) throws SAXException {
        if (isParsing) {
            builder.append(new String(chars, i, i1));
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (isParsing) {
            builder.append("</" + qName + ">");
        }
        if ("element1".equals(localName)) {
            isParsing = false;
        }
    }
};

saxXmlReader.setContentHandler(handler);
saxXmlReader.setErrorHandler(handler);

saxXmlReader.parse(new InputSource(new FileInputStream(input)));



回答3:


You should record the contents via characters(), append to a StringBuilder for each invocation and only store the concatenated value upon the endElement() call.

Why ? Because characters() can be called multiple times for the element content - each call referencing a successive subsequence of that text element.



来源:https://stackoverflow.com/questions/4119870/sax-how-to-get-the-content-of-an-element

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!