Automatic XSD validation

前端 未结 2 1383
时光取名叫无心
时光取名叫无心 2021-01-04 20:35

According to the lxml documentation \"The DTD is retrieved automatically based on the DOCTYPE of the parsed document. All you have to do is use a parser that has DTD validat

相关标签:
2条回答
  • 2021-01-04 21:25

    I have a project that has over 100 different schemas and xml trees. In order to manage all of them and validate them i did a few things.

    1) I created a file (i.e. xmlTrees.py) where i created a dictionary of every xml and corresponding schema associated with it, and the xml path. This allowed me to have a single place to get both xml & the schema used to validate that xml.

    MY_XML = {'url':'/pathToTree/myTree.xml', 'schema':'myXSD.xsd'}
    

    2) In the project we have equally as many namespaces (very hard to manage). So what i did was again i created a single file that contained all the namespaces in the format lxml likes. Then in my tests and scripts i would just always pass the superset of namespaces.

    ALL_NAMESPACES = {
        'namespace1':  'http://www.example.org',
        'namespace2':  'http://www.example2.org'
    }
    

    3) For basic/generic validation i ended up creating a basic function i could call:

        def validateXML(content, schemaContent):
    
        try:
            xmlSchema_doc = etree.parse(schemaContent);
            xmlSchema = etree.XMLSchema(xmlSchema_doc);
            xml = etree.parse(StringIO(content));
        except:
            logging.critical("Could not parse schema or content to validate xml");
            response['valid'] = False;
            response['errorlog'] = "Could not parse schema or content to validate xml";
    
        response = {}
        # Validate the content against the schema.
        try:
            xmlSchema.assertValid(xml)
            response['valid'] = True
            response['errorlog'] = None
        except etree.DocumentInvalid, info:
            response['valid'] = False
            response['errorlog'] = xmlSchema.error_log
    
        return response
    

    basically any function that wants to use this needs to send the xml content and the xsd content as strings. This provided me with the most flexability. I then just placed this function in a file where i had all my xml helper functions.

    0 讨论(0)
  • 2021-01-04 21:36

    You could extract the schemas yourself and import them into a root schema:

    from lxml import etree
    
    XSI = "http://www.w3.org/2001/XMLSchema-instance"
    XS = '{http://www.w3.org/2001/XMLSchema}'
    
    
    SCHEMA_TEMPLATE = """<?xml version = "1.0" encoding = "UTF-8"?>
    <xs:schema xmlns="http://dummy.libxml2.validator"
    targetNamespace="http://dummy.libxml2.validator"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="1.0"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified">
    </xs:schema>"""
    
    
    def validate_XML(xml):
        """Validate an XML file represented as string. Follow all schemaLocations.
    
        :param xml: XML represented as string.
        :type xml: str
        """
        tree = etree.XML(xml)
        schema_tree = etree.XML(SCHEMA_TEMPLATE)
        # Find all unique instances of 'xsi:schemaLocation="<namespace> <path-to-schema.xsd> ..."'
        schema_locations = set(tree.xpath("//*/@xsi:schemaLocation", namespaces={'xsi': XSI}))
        for schema_location in schema_locations:
            # Split namespaces and schema locations ; use strip to remove leading
            # and trailing whitespace.
            namespaces_locations = schema_location.strip().split()
            # Import all found namspace/schema location pairs
            for namespace, location in zip(*[iter(namespaces_locations)] * 2):
                xs_import = etree.Element(XS + "import")
                xs_import.attrib['namespace'] = namespace
                xs_import.attrib['schemaLocation'] = location
                schema_tree.append(xs_import)
        # Contstruct the schema
        schema = etree.XMLSchema(schema_tree)
        # Validate!
        schema.assertValid(tree)
    

    BTW, your simpletest.xsd is missing the targetNamespace.

    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.example.org" elementFormDefault="qualified">
        <xs:element name="name" type="xs:string"/>
    </xs:schema>
    

    With the code above, your example document validates against this schema.

    0 讨论(0)
提交回复
热议问题