Our C++ application reads configuration data from XML files that look something like this:
Just a guess, but can you try adding use="required"
to each of your attribute specifications?
<xs:complexType name="value_type">
<!-- This doesn't work -->
<xs:sequence>
<xs:attribute name="id" type="xs:string" use="required" />
<xs:attribute name="name" type="xs:string" use="required" />
<xs:attribute name="description" type="xs:string" use="required" />
</xs:sequence>
</xs:complexType>
I'm wondering if the parser is being slowed down by allowing optional attributes, when it appears your attributes will always be there.
Again, just a guess.
EDIT: XML 1.0 spec says that attribute order is not significant. http://www.w3.org/TR/REC-xml/#sec-starttags
Therefore, XSD won't enforce any order. But that doesn't mean that parsers can't be fooled into working quickly, so I'm keeping the above answer published in case it actually works.
I don't think XML Schema supports that - attributes are just defined and restricted by name, e.g. they have to match a particular name - but I don't see how you could define an order for those attributes in XSD.
I don't know of any other way to make sure attributes on a XML node come in a particular order - not sure if any of the other XML schema mechanisms like Schematron or Relax NG would support that....
As others have pointed out, no, you can't rely on attribute ordering.
If I had any process at all involving 2,500 XML files and 1.5 million key/value pairs, I would get that data out of XML and into a more usable form as soon as I possibly could. A database, a binary serialization format, whatever. You're not getting any advantage out of using XML (other than schema validation). I'd update my store every time I got a new XML file, and take parsing 1.5 million XML elements out of the main flow of my process.
According to the xml specification,
the order of attribute specifications in a start-tag or empty-element tag is not significant
You can check it at section 3.1
The answer is no, alas. I'm shocked by your 40% figure. I find it hard to believe that turning "foo" into ProcessFoo takes that long. Are you sure the 40% doesn't include the time taken to execute ProcessFoo?
Is it possible to access the attributes by name using this Expat thing? That's the more traditional way to access attributes. I'm not saying it's going to be faster, but it might be worth a try.
I'm pretty sure there's no way to enforce attribute order in an XML document. I'm going to assume that you can insist on it via a business process or other human factors, such as a contract or other document.
What if you just assumed that the first attribute was "id", and tested the name to be sure? If yes, use the value, if not, then you can try to get the attribute by name or throw out the document.
While not as efficient as calling out the attribute by its ordinal, some non-zero number of times you'll be able to guess that your data providers have delivered XML to spec. The rest of the time, you can take other action.