问题
I have a specific format XML document that I will get pushed. This document will always be the same type so it's very strict.
I need to parse this so that I can convert it into JSON (well, a slightly bastardized version so someone else can use it with DOJO).
My question is, shall I use a very fast lightweight (no need for SAX, etc.) XML parser (any ideas?) or write my own, basically converting into a StringBuffer and spinning through the array? Basically, under the covers I assume all HTML parsers will spin thru the string (or memory buffer) and parse, producing output on the way through.
Thanks
edit
The xml will be between 3/4 lines to about 50 max (at the extreme)..
回答1:
No, you should not try to write your own XML parser for this.
SAX itself is very lightweight and fast, so I'm not sure why think it's too much. Also using a string buffer would actually be much less scalable then using SAX because SAX doesn't require you to load the whole XML file into memory to use it. I've used SAX to parse through multigigabyte XML files, which you wouldn't be able to do using string buffers on a 32 bit machine.
If you have small files and you don't need to worry about performance, look into using the DOM. Java's implementation can be kind of annoying to use (You create a document by using a DocumentBuilder, which comes from a DocumentBuilderFactory)
The code to create a document from a file looks like this:
Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("file.xml"));
(note that keeping a reference to your document builder will speed things up if you need to parse multiple files)
Then you use the function in org.w3c.dom.Document to read or manipulate the contents. For example getElementsByTagName() returns all the Elements with a certain tag name.
回答2:
It really depends on the type of XML you're parsing. I wouldn't write your own parser when there's something already there to do the job for you.
The choice of SAX/DOM is really based on what you're trying to parse, see this for how to decide on which one to use:
http://geekexplains.blogspot.com/2009/04/sax-vs-dom-differences-between-dom-and.html
Even if you don't use SAX/DOM there are still simple options available to you, take a look at Simple : )
http://simple.sourceforge.net/
You may also want to consider STaX.
回答3:
Maybe you should look at kXML 2, a small XML pull parser specially designed for constrained environments, to access, parse, and display XML files for Java 2 Micro Edition-enabled devices. It works well with Java SE/EE too ;-). As it is designed for micro edition, it is really light-weight (small footprint) and IMHO really easy to use (much more easier than SAX/DOM etc. stuff).
From my own experience with kXML 2: I used it to parse XML files larger than 1 GB - Wikipedia dumps and I was very happy with performance / memory consumption etc.
At last ;-) - link: http://kxml.sourceforge.net/kxml2/
回答4:
you can use Dom4j/xstream to read the xml into an equivalent java modal and then use JSONLIB to convert into JSON.
回答5:
Do you really need to parse/manipulate any of the data in the XML document? If not, you could just create use an XSLT. Really simple, really fast.
回答6:
Use a real XML parser. If you don't, you will probably get bitten when something changes. The document may be "very strict", but in two years time, something will probably get re-factored and it will change structure so that it parses to the same data structure with an XML parser and breaks a homebrew string parser.
回答7:
parsing on the backend and exposing JSON is probably the right way to go so that you would have general purpose JSON data that you can easily integrate with other sources, but if you have a simple message and this is the only place you think you'd be using JSON, you could try to do the parsing client side. Dojo has an experimental client-side XML parser
回答8:
Do you have to use XML?
I found that my own custom text format was much faster than either XML or JSON with any of the off the shelf packages - they were fast, but by controlling my own format and just doing String parsing I was able to cut the time in half against the fastest XML implementation.
Obviously this only works if you're fully in charge of formats and may not be appropriate to your situation, but for any others in this situation: don't think XML is the absolute fastest option you have. It's not.
来源:https://stackoverflow.com/questions/2134507/fast-lightweight-xml-parser