I have the following problem:
I\'ve got an XML file (approx 1GB), and have to iterate up and down (i.e. not sequential; one after the other) in order to get the req
If you want to use a higher-level approach than SAX, which can be very tricky to program, you could look at streaming XSLT transformations using a recent Saxon-EE release. However, you've been too vague about the precise processing that you are doing to know whether this will work for your particular case.
SAX (Simple API for XML) will help you here.
Unlike the DOM parser, the SAX parser does not create an in-memory representation of the XML document and so is faster and uses less memory. Instead, the SAX parser informs clients of the XML document structure by invoking callbacks, that is, by invoking methods on a
org.xml.sax.helpers.DefaultHandler
instance provided to the parser.
Here is an example implementation:
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
DefaultHandler handler = new MyHandler();
parser.parse("file.xml", handler);
Where in MyHandler
you define the actions to be taken when events like start/end of document/element are generated.
class MyHandler extends DefaultHandler {
@Override
public void startDocument() throws SAXException {
}
@Override
public void endDocument() throws SAXException {
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
}
// To take specific actions for each chunk of character data (such as
// adding the data to a node or buffer, or printing it to a file).
@Override
public void characters(char ch[], int start, int length)
throws SAXException {
}
}
If you don't want to be bound by the memory limits, I certainly recommend you to use your current approach, and store everything in database.
The parsing of the XML file should be done by a SAX parser
, as everybody has recommended (including me). This way you can create one object at a time, and you can immediately persist it into the database.
For the post-processing (resolving cross-references), you can use SELECT
s from the database, make primary keys, indexes, etc. You can use ORM (Eclipselink, Hibernate) as well if you feel comfortable with that.
Actually I don't really recommend SQLite, it's easier to set up a MySQL server, and store the data there. Later you can even reuse the XML data (if you don't delete).
if you require a resource friendly approach to handle very large xml try this: http://www.xml2java.net/xml-to-java-data-binding-for-big-data/ it allows you to process data in a SAX way, but with the advantage of getting high level events (xml data mapped onto java) and being able to work with these objects in your code directly. so it combines jaxb convenience and SAX resource friendlyness.