问题
I have a big xml file and I do not wish to parse it, I just want to store every single character between <information>...</information>
, which are tags inside the xml file.
How can I do this?
回答1:
If the problem is that the data you're trying to extract will fit in memory, but the entire XML file won't, then use a streaming parser such as XPP.
回答2:
You can't accurately find the characters in the <information>
element without parsing the file. You could do something that works 99% of the time, but it would break when someone does something you didn't expect, like putting whitespace in the start tag, or having a commented-out <information>
element, or putting part of the <information>
element in an external entity.
Bite the bullet. If it's XML, you need an XML parser to read it.
回答3:
You may want to explain why you don't want to parse it as that would help in suggesting other solutions.
That being said, if you can construct an XPath for that node, you can always get that information with XPath. See this tutorial.
UPDATE
Given the new information, this isn't the solution you want. If you want to treat the xml as a string, reading it into a StringBuilder (the faster, thread-unsafe version of StringBuffer) is your best bet. If you're having trouble using StringBuffer, please post the code you tried and the error messages. It's max size is java.lang.Integer.MAX_VALUE
which is 2147483647.
回答4:
Considering that you do not want to use a parser and you are just interested in extracting all characters between two tags, I'd rather suggest you to extract the xml content as string, and use a simple regular expression match to extract the portion between the two tags.
来源:https://stackoverflow.com/questions/6755475/how-to-extract-a-big-list-of-characters-from-xml-file-in-java