I\'m writing an application which processes a lot of xml files (>1000) with deep node structures. It takes about six seconds with with woodstox (Event API) to parse a file w
This one is obvious: just create several parsers and run them in parallel in multiple threads.
Take a look at Woodstox Performance (down at the moment, try google cache).
This can be done IF structure of your XML is predictable: if it has a lot of same top-level elements. For instance:
more elements
other elements
In this case you could create simple splitter that searches and feeds this part to a particular parser instance. That's a simplified approach: in real life I'd go with RandomAccessFile to find start stop points () and then create custom FileInputStream that just operates on a part of file.
Take a look at Aalto. The same guys that created Woodstox. This are experts in this area - don't reinvent the wheel.