How to read well formed XML in Java, but skip the schema?

后端 未结 5 1950
自闭症患者
自闭症患者 2020-11-30 10:21

I want to read an XML file that has a schema declaration in it.

And that\'s all I want to do, read it. I don\'t care if it\'s valid, but I want it to be well formed.

相关标签:
5条回答
  • 2020-11-30 10:47

    I've not tested this, but you could try calling setSchema on the factory passing null.

    i.e.

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    dbf.setSchema(null);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(file);
    

    Update: Looking at DocumentBuilderImpl it looks like this might work, from the constructor it will check the grammar from the factory before checking the schema.

    From DocumentBuilderFactoryImpl:

    public void setSchema(Schema grammar) {
        this.grammar = grammar;
    }
    

    From DocumentBuilderImpl constructor:

    ...
    this.grammar = dbf.getSchema();
    if (grammar != null) {
        XMLParserConfiguration config = domParser.getXMLParserConfiguration();
        XMLComponent validatorComponent = null;
        /** For Xerces grammars, use built-in schema validator. **/
        ...
    }
    
    0 讨论(0)
  • 2020-11-30 10:51

    The reference is not for Schema, but for a DTD.

    DTD files can contain more than just structural rules. They can also contain entity references. XML parsers are obliged to load and parse DTD references, because they could contain entity references that might affect how the document is parsed and the content of the file(you could have an entity reference for characters or even whole phrases of text).

    If you want to want to avoid loading and parsing the referenced DTD, you can provide your own EntityResolver and test for the referenced DTD and decide whether load a local copy of the DTD file or just return null.

    Code sample from the referenced answer on custom EntityResolvers:

       builder.setEntityResolver(new EntityResolver() {
            @Override
            public InputSource resolveEntity(String publicId, String systemId)
                    throws SAXException, IOException {
                if (systemId.contains("foo.dtd")) {
                    return new InputSource(new StringReader(""));
                } else {
                    return null;
                }
            }
        });
    
    0 讨论(0)
  • 2020-11-30 10:55

    This works well to check whether the xml is well formed irrespective of whether it contains a DTD declaration or not.

    0 讨论(0)
  • 2020-11-30 10:58

    The simplest answer is this one-liner, called after creating the DocumentBuilderFactory:

    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    

    Shamelessly cribbed from Make DocumentBuilder.parse ignore DTD references.

    0 讨论(0)
  • 2020-11-30 11:02

    The issue here isn't one of validation. Regardless of validation settings, the parser will still attempt to resolve any references in your document, such as entities, DTDs and (sometimes) schemas. It's only later on that it decides to validate using them (or not). You need to plug in an entity resolver to "intercept" these attempts at de-referencing.

    Check out Apache XML Resolver for an easy(ish) way to do this.

    0 讨论(0)
提交回复
热议问题