I have some Delphi code to read and validates XML files based on an XSD document. I am using using Windows DOM (TMXLDocument). This Article explains the underlying logic.<
TXMLDocument
does not directly support enabling XSD validation when using MSXML, so it is MSXML's responsibility to manage it. Enabling the poResolveExternals
and poValidateOnParse
flags is important for that, but there are some other factors to consider. Most importantly, although MSXML does support referencing an XSD from within the XML, it has some limitations on whether the referenced XSD will actually be used while loading the XML:
Referencing XSD Schemas in Documents
To reference an XML Schema (XSD) schema from an XML document in MSXML 6.0, you can use any one of the following means to link a schema to an XML document so that MSXML will use the schema to validate the document contents.
Reference the XSD schema in the XML document using XML schema instance attributes such as either xsi:schemaLocation or xsi:noNamespaceSchemaLocation.
Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.
...
The xsi:schemaLocation attribute works well in situations where namespace prefixes are explicitly declared and used in the XML document you want to validate.
The following example shows an XML document that references an external XSD schema, MyData.xsd for us in validating nodes that are in the 'urn:MyData' namespace URI , which is mapped to the "MyData:" namespace prefix.
<catalog xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation="urn:MyData http://www.example.com/MyData.xsd" <MyData:book xmlns:MyData="urn:MyData"> <MyData:title>Presenting XML</MyData:title> <MyData:author>Richard Light</MyData:author> </MyData:book>
In order for the MyData.xsd file to be paired with and used you to validate elements and attribute nodes that start with the "MyData:", the schema needs to use and contain the following schema attributes:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:MyData="urn:MyData" targetNamespace="urn:MyData" elementFormDefault="qualified">
These attributes declare the 'urn:MyData' namespace URI and the "MyData:" namespace prefix so that they correspond identically to how these declarations were made in the XML file. If they do not match, the schema at the specified location would never be invoked during validation.
You have not shown your XSD yet, but the XML you have shown does not conform to the rules mentioned in the above documentation. In particular, you are missing the use of a urn
namespace mapping, and prefixes on the XML nodes that you want to validate. Some versions of MSXML might handle this better than others, which could explain why the validation works on some machines and is ignored on other machines, depending on which versions of MSXML are installed.
That being said, you may have to resort to the second approach mentioned in the documentation:
- Add the XSD schema file to a schema cache and then connect that cache to the DOM document or SAX reader, prior to loading or parsing the XML document.
That requires using MSXML directly, you can't do it with TXMLDocument
:
MSXML also provides a means to connect and use a schema cache to store, load and connect a schema to an XML document, such as in the following VBScript code excerpt:
'Create the schema cache and add the XSD schema to it. set oSC = CreateObject("MSXML2.XMLSchemaCache.6.0") oSC.Add "urn:MyData", "http://www.example.com/MyData.xsd" 'Create the DOM document assign the cache to its schemas property. set oXD = CreateObject("MSXML2.DOMDocument.6.0") oXD.schemas = oSC 'Set properties, load and validate it in the XML DOM.
The gotcha is that you have to know where the XSD is located in order to hook it up to the parser. So, you would have to load the XML once just to extract the XSD location, then load the XSD into an schema cache, and then re-load the XML with the XSD attached. Here are some Delphi examples of that:
schema validation with msxml in delphi
function TForm1.ValidXML2(const xmlFile: String;
out err: IXMLDOMParseError): Boolean;
var
xml, xml2, xsd: IXMLDOMDocument2;
schemas, cache: IXMLDOMSchemaCollection;
begin
xml := CoDOMDocument.Create;
if xml.load(xmlFile) then
begin
schemas := xml.namespaces;
if schemas.length > 0 then
begin
xsd := CoDOMDocument40.Create;
xsd.Async := False;
xsd.load(schemas.namespaceURI[0]);
cache := CoXMLSchemaCache40.Create;
cache.add(schemas.namespaceURI[1], xsd);
xml2 := CoDOMDocument40.Create;
xml2.async := False;
xml2.schemas := cache;
Result := xml2.load(xmlFile);
//err := xml.validate;
if not Result then
err := xml2.parseError
else
err := nil;
end;
end;
end;
How to validate a IXMLDocument against a XML Schema?
unit XMLValidate;
// Requirements ----------------------------------------------------------------
//
// MSXML 4.0 Service Pack 1
// http://www.microsoft.com/downloads/release.asp?releaseid=37176
//
// -----------------------------------------------------------------------------
interface
uses
SysUtils, XMLIntf, xmldom, XMLSchema;
type
EValidateXMLError = class(Exception)
private
FErrorCode: Integer;
FReason: string;
public
constructor Create(AErrorCode: Integer; const AReason: string);
property ErrorCode: Integer read FErrorCode;
property Reason: string read FReason;
end;
procedure ValidateXMLDoc(const Doc: IDOMDocument; const SchemaLocation, SchemaNS: WideString); overload;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const SchemaLocation, SchemaNS: WideString); overload;
procedure ValidateXMLDoc(const Doc: IDOMDocument; const Schema: IXMLSchemaDoc); overload;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const Schema: IXMLSchemaDoc); overload;
implementation
uses
Windows, ComObj, msxmldom, MSXML2_TLB;
resourcestring
RsValidateError = 'Validate XML Error (%.8x), Reason: %s';
{ EValidateXMLError }
constructor EValidateXMLError.Create(AErrorCode: Integer; const AReason: string);
begin
inherited CreateResFmt(@RsValidateError, [AErrorCode, AReason]);
FErrorCode := AErrorCode;
FReason := AReason;
end;
{ Utility routines }
function DOMToMSDom(const Doc: IDOMDocument): IXMLDOMDocument2;
begin
Result := ((Doc as IXMLDOMNodeRef).GetXMLDOMNode as IXMLDOMDocument2);
end;
function LoadMSDom(const FileName: WideString): IXMLDOMDocument2;
begin
Result := CoDOMDocument40.Create;
Result.async := False;
Result.resolveExternals := True; //False;
Result.validateOnParse := True;
Result.load(FileName);
end;
{ Validate }
procedure InternalValidateXMLDoc(const Doc: IDOMDocument; const SchemaDoc: IXMLDOMDocument2; const SchemaNS: WideString);
var
MsxmlDoc: IXMLDOMDocument2;
SchemaCache: IXMLDOMSchemaCollection;
Error: IXMLDOMParseError;
begin
MsxmlDoc := DOMToMSDom(Doc);
SchemaCache := CoXMLSchemaCache40.Create;
SchemaCache.add(SchemaNS, SchemaDoc);
MsxmlDoc.schemas := SchemaCache;
Error := MsxmlDoc.validate;
if Error.errorCode <> S_OK then
raise EValidateXMLError.Create(Error.errorCode, Error.reason);
end;
procedure ValidateXMLDoc(const Doc: IDOMDocument; const SchemaLocation, SchemaNS: WideString);
begin
InternalValidateXMLDoc(Doc, LoadMSDom(SchemaLocation), SchemaNS);
end;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const SchemaLocation, SchemaNS: WideString);
begin
InternalValidateXMLDoc(Doc.DOMDocument, LoadMSDom(SchemaLocation), SchemaNS);
end;
procedure ValidateXMLDoc(const Doc: IDOMDocument; const Schema: IXMLSchemaDoc);
begin
InternalValidateXMLDoc(Doc, DOMToMSDom(Schema.DOMDocument), '');
end;
procedure ValidateXMLDoc(const Doc: XMLIntf.IXMLDocument; const Schema: IXMLSchemaDoc);
begin
InternalValidateXMLDoc(Doc.DOMDocument, DOMToMSDom(Schema.DOMDocument), '');
end;
end.
Doc := LoadXMLData(XmlFileEdit.Lines.Text);
ValidateXMLDoc(Doc, FSchemaFileName, 'http://www.foo.com');
XML Documents, Schemas and Validation
var
XML, XSDL: Variant;
begin
XSDL := CreateOLEObject('MSXML2.XMLSchemaCache.4.0');
XSDL.validateOnLoad := True;
XSDL.add('','MySchema.xsd'); // 1st argument is target namespace
ShowMessage('Schema Loaded');
XML := CreateOLEObject('MSXML2.DOMDocument.4.0');
XML.validateOnParse := True;
XML.resolveExternals := True;
XML.schemas := XSDL;
XML.load('file.xml');
ShowMessage(XML.parseError.reason);
end.
I know this question is tagged for Delphi, but I thought some Embarcadero C++ Builder users might benefit from seeing a C++ implementation of Remy's last example using MSXML2 OLE objects.
I know I wish someone would have posted this a few days ago. XD
.h file:
//------------------------------------------------------------------------------
#ifndef XmlValidatorUH
#define XmlValidatorUH
//------------------------------------------------------------------------------
class PACKAGE TXmlValidator
{
private:
Variant FSchemaCache;
Variant FXmlDomDoc;
// TAutoCmd Variables
Procedure CacheProcAdd;
PropertySet CacheSetValidateOnLoad;
Procedure XmlProcLoadXml;
PropertySet XmlSetValidateOnParse;
PropertySet XmlSetResolveExternals;
PropertySet XmlSetSchemas;
PropertyGet XmlGetParseError;
PropertyGet ParseErrorGetReason;
public:
__fastcall TXmlValidator( String _SchemaLocation );
String __fastcall ValidationError( String _Xml );
};
//------------------------------------------------------------------------------
#endif
.cpp file:
//------------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
//------------------------------------------------------------------------------
#include "XmlValidatorU.h"
#include <System.Win.ComObj.hpp>
//------------------------------------------------------------------------------
#pragma package(smart_init)
//------------------------------------------------------------------------------
// Validates XML against Schema
//------------------------------------------------------------------------------
// This class uses OLE objects from MSXML2 to validate XML from an XSD file.
// Generally, use the following steps to deal with OLE objects:
// 1. Define a Variant variable for your OLE Object; assign using CreateOleObject().
// 2. Define your TAutoCmd objects that will be used in Variant.Exec()
// 3. Set TAutoCmd args using << to add settings
// 4. Once everything is set up, call Exec() on your OLE Object variant
// More documentation on OLE objects / TAutoCmd at:
// http://docwiki.embarcadero.com/CodeExamples/Rio/en/AutoCmd_(C%2B%2B)
//------------------------------------------------------------------------------
// This macro clarifies that we're registering OLE Function names to our defined TAutoCmd variables.
//
#define RegisterAutoCmd( _AutoCmd, _OleFunc ) _AutoCmd( _OleFunc )
//------------------------------------------------------------------------------
// These macros clear AutoCmdArgs before setting them.
// I made these because setting an arg multiple times just stacks them up, changing the function signature.
// Then, OLE throws a "Member Not Found" error because it can't find a function with that signature.
//
#define AutoCmdArg( _AutoCmd, _Arg ) _AutoCmd.ClearArgs(); _AutoCmd << _Arg
#define AutoCmdArgs( _AutoCmd, _Arg1, _Arg2 ) AutoCmdArg( _AutoCmd, _Arg1 ); _AutoCmd << _Arg2
//------------------------------------------------------------------------------
__fastcall TXmlValidator::TXmlValidator( String _SchemaLocation )
:
RegisterAutoCmd( CacheProcAdd, "add" ),
RegisterAutoCmd( CacheSetValidateOnLoad, "validateOnLoad" ),
RegisterAutoCmd( XmlProcLoadXml, "loadXML" ),
RegisterAutoCmd( XmlSetValidateOnParse, "validateOnParse" ),
RegisterAutoCmd( XmlSetResolveExternals, "resolveExternals" ),
RegisterAutoCmd( XmlSetSchemas, "schemas" ),
RegisterAutoCmd( XmlGetParseError, "parseError" ),
RegisterAutoCmd( ParseErrorGetReason, "reason" )
{
if ( _SchemaLocation.IsEmpty() )
{
throw Exception( String( __FUNC__ ) + " - Missing Schema Location" );
}
// Instantiate the OLE objects
FSchemaCache = CreateOleObject( "MSXML2.XMLSchemaCache.4.0" );
FXmlDomDoc = CreateOleObject( "MSXML2.DOMDocument.4.0" );
// Set static args that shouldn't change
AutoCmdArg( CacheSetValidateOnLoad, true );
AutoCmdArg( XmlSetValidateOnParse, true );
AutoCmdArg( XmlSetResolveExternals, true );
const AnsiString NoNameSpace = "";
AutoCmdArgs( CacheProcAdd, NoNameSpace, AnsiString( _SchemaLocation ) );
// Load Cache
FSchemaCache.Exec( CacheSetValidateOnLoad ); // Validate on Load
FSchemaCache.Exec( CacheProcAdd ); // Add Schema file location to the cache
// Now that the cache is loaded, set cached schema as arg to XML
AutoCmdArg( XmlSetSchemas, FSchemaCache );
}
//------------------------------------------------------------------------------
String __fastcall TXmlValidator::ValidationError( String _Xml )
{
AutoCmdArg( XmlProcLoadXml, AnsiString( _Xml ) );
FXmlDomDoc.Exec( XmlSetValidateOnParse );
FXmlDomDoc.Exec( XmlSetResolveExternals );
FXmlDomDoc.Exec( XmlSetSchemas );
FXmlDomDoc.Exec( XmlProcLoadXml );
Variant ParseErr = FXmlDomDoc.Exec( XmlGetParseError );
return ParseErr.Exec( ParseErrorGetReason );
}
//------------------------------------------------------------------------------