I\'m sure this might have been discussed at length or answered before, however I need a bit more information on the best approach for my situation...
Problem
We had a similar situation and I just threw together some XPath code that parsed the stuff I needed.
It was amazingly quick even on 100k+ XML files. We went as low tech as possible. We handle around 1000 files a day of that size and parsing time is very low. We have no memory issues, leaks etc.
We wrote a quick prototype in Groovy (if my memory is accurate) - proof of concept took me about 10 minutes