问题
I need to validate big xml with limited memory usage. With every code i've found so far i get out of memory error.
Methods i tried:
//method 1
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(inputXml));
//method2
XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema vs = sf.createSchema(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd"));
XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory2.newInstance().createXMLStreamReader(new FileInputStream(inputXml));
sr.validateAgainst(vs);
try {
while (sr.hasNext()) {
sr.next();
}
System.out.println("Validated ok!");
} catch (XMLValidationException ve) {
System.err.println("Validation problem: "+ve);
isValid = false;
}
sr.close();
//method 3
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
String fileName = Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile();
Schema schema = factory.newSchema(new File(fileName));
Validator validator = schema.newValidator();
// create a source from a file
StreamSource source = new StreamSource(new File(inputXml));
// check input
validator.validate(source);
i get OutOfMemory every time
EDIT
with XOM
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())}));
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
Builder builder = new Builder(reader);
builder.build(new FileInputStream(new File(inputXml)));
still memory usage is very high, for 15mb xml - 250mb of heap stacktrace:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleCharacters(XMLSchemaValidator.java:1574)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.characters(XMLSchemaValidator.java:789)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:441)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
EDIT My xml has large base64 string
回答1:
Look at this article on XML unmarshalling from Marco Tedone see here. Based on his conclusion I would recommend for low memory consumption STax:
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(fileInputStream);
Validator validator = schema.newValidator();
validator.validate(new StAXSource(xmlStreamReader));
回答2:
It's possible that the memory is being used for the schema, not the source document. You haven't said anything about the schema. Some can use very high amounts of memory, for example if you have large finite values of minOccurs or maxOccurs in your content model. At what point does the out of memory exception occur?
来源:https://stackoverflow.com/questions/9788868/how-to-validate-big-xml-against-xsd-schema