问题
So, I have a file like
<root>
<transaction ts="1">
<abc><def></def></abc>
</transaction>
<transaction ts="2">
<abc><def></def></abc>
</transaction>
</root>
So, I have a condition which says if ts="2" then do something ... Now the problem is when it finds ts="1" it still scans through tags < abc>< def> and then reaches < transaction ts="2">
Is there a way when the condition doesn`t match the parsing breaks and look for the next transaction tag directly?
回答1:
A SAX parser must scan thru all sub trees (like your "< abc>< def>< /def>< /abc>") to know where the next element starts. No way to get around it, which is also the reason why you cannot parallelize a XML Parser for a single XML document.
The only two ways of tuning I can think of in your case:
1) If you have many XML documents to parse, you can run one Parser for each document in its own thread. This would at least parallelize the overall work and utilize all CPU's and Cores you have available.
2) If you just need to read up to a certain condition (like you mentioned < transaction ts="2">) you can skip parsing as soon as that condition is reached. If skipping the parser would help, the way to this is by throwing an Exception.
Your implementation of startElement
within the ContentHandler
would look like this:
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
if(atts == null) return;
if(localName.equals("transaction") && "2".equals(atts.getValue("ts"))) {
// TODO: Whatever should happen when condition is reached
throw new SAXException("Condition reached. Just skip rest of parsing");
}
}
回答2:
Is there a way when the condition doesn`t match the parsing breaks and look for the next transaction tag directly?
No. You'll have to write the SAX parser to know when to skip looking at the tags in the bad transaction block. That said, you'll probably find switching to STAX to be easier to do stuff like this than SAX.
回答3:
The sax parser calls your callbacks always for each XML element.
You can solve your question by setting a field isIgnoreCurrentTransaction
, once you detect the condition to ignore. Then in your other sax callbacks you check for isIgnoreCurrentTransaction
amd simply do nothing in that case.
回答4:
You can use a control flag in your SAX implementation which is raised when you detect your condition on a certain tag and lower the flag again once you exit the tag. You can use that flag to skip any processing when the parser runs through the children of the tag you are not interested in.
Note however that your example XML is not valid. You need to use proper nesting of your tags before you can process it with a SAX implementation, as stated in the comments.
来源:https://stackoverflow.com/questions/18064716/sax-parser-to-skip-some-elements-which-are-not-to-be-parsed