Parallel XML Parsing in Java

前端 未结 3 1486

I\'m writing an application which processes a lot of xml files (>1000) with deep node structures. It takes about six seconds with with woodstox (Event API) to parse a file w

3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-02 03:47

    1. This one is obvious: just create several parsers and run them in parallel in multiple threads.

    2. Take a look at Woodstox Performance (down at the moment, try google cache).

    3. This can be done IF structure of your XML is predictable: if it has a lot of same top-level elements. For instance:

      
          more elements
       
      
          other elements
      
      

      In this case you could create simple splitter that searches and feeds this part to a particular parser instance. That's a simplified approach: in real life I'd go with RandomAccessFile to find start stop points () and then create custom FileInputStream that just operates on a part of file.

    4. Take a look at Aalto. The same guys that created Woodstox. This are experts in this area - don't reinvent the wheel.

提交回复
热议问题