问题
I am relatively new to XSL-T. My requirement is rather simple. I want to add missing elements of Schema which are not present in the xml as empty tags.
For example,
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified"
elementFormDefault="qualified">
<xs:element name="RootElement">
<xs:complexType>
<xs:sequence>
<xs:element name="XMLTagOne">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="value1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="RepeatableElementOne" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="value2"/>
<xs:element name="RepeatableElemenTwo" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="value3"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Consider This i/p:
<RootElement>
<RepeatableElementOne>
<value2>bb</value2>
<RepeatableElemenTwo>
<value3>cc</value3>
</RepeatableElemenTwo>
<RepeatableElemenTwo>
<value3>dd</value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
<RepeatableElementOne>
<value2>ee</value2>
</RepeatableElementOne>
</RootElement>
For this i/p I want the element <XMLTagOne>
and <RepeatableElemenTwo>
to be added as empty tags.
Expected O/P:
<RootElement>
<XMLTagOne> <!-- Added as empty tag though not present in i/p-->
<value1></value1>
</XMLTagOne>
<RepeatableElementOne>
<value2>bb</value2>
<RepeatableElemenTwo>
<value3>cc</value3>
</RepeatableElemenTwo>
<RepeatableElemenTwo>
<value3>dd</value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
<RepeatableElementOne>
<value2>ee</value2>
<RepeatableElemenTwo>
<value3></value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
</RootElement>
With Some Initial Research I discovered that I have to traverse through every node with an identity template matching all elements. Can you suggest me how I can approach this problem? Thanks.
EDIT
My Design approach:
- Create an intermediate xml document based on the xsd.
Something like this,
<Root>
<a></a>
<b></b>
<Root>
- Traverse through all the individual nodes. (
Identity Template
??) - Get the value of each node from the source XML which I have in-hand.
Problems in this approach
- Repeating Elements.
- Will have to check if the count is > 1. If so, then use
<xsl:for-each>
to process the nodes from the source document.
回答1:
Let us suppose you have what you called an intermediate XML document based on the XSD Schema (more on this at the end), from now on specimen XML:
Specimen XML:
<RootElement>
<XMLTagOne>
<value1 property=""> </value1>
</XMLTagOne>
<RepeatableElementOne attr1="">
<value2></value2>
<RepeatableElemenTwo>
<value3></value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
</RootElement>
(I added a couple of attributes to show that the proposed solution works for them too)
Input XML:
<RootElement>
<RepeatableElementOne attr1="lorem ipsum">
<value2>bb</value2>
<RepeatableElemenTwo>
<value3>cc</value3>
</RepeatableElemenTwo>
<RepeatableElemenTwo>
<value3>dd</value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
<RepeatableElementOne>
<value2>ee</value2>
</RepeatableElementOne>
</RootElement>
(as per input in the OP, with an added attribute)
XSLT 1.0:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<!--
some node in the input document could be missing,
so we must apply the templates to the specimen document nodes
-->
<xsl:apply-templates select="document('specimen.xml')/*">
<xsl:with-param name="instanceNode" select="*"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="@*">
<xsl:param name="instanceNode"/>
<xsl:attribute name="{name(.)}">
<xsl:value-of select="$instanceNode"/>
</xsl:attribute>
</xsl:template>
<xsl:template match="*">
<xsl:param name="instanceNode"/>
<xsl:choose>
<xsl:when test="$instanceNode">
<!-- the node is present in the input document -->
<xsl:copy>
<!-- attributes -->
<xsl:for-each select="@*">
<xsl:apply-templates select=".">
<xsl:with-param name="instanceNode" select="$instanceNode/@*[name() = name(current())]"/>
</xsl:apply-templates>
</xsl:for-each>
<!-- elements -->
<xsl:for-each select="*">
<xsl:variable name="specimenNode" select="."/>
<xsl:variable name="instanceNodes" select="$instanceNode/*[name() = name(current())]"/>
<xsl:choose>
<xsl:when test="$instanceNodes">
<!-- one or more elements in the input file -->
<xsl:for-each select="$instanceNodes">
<xsl:apply-templates select="$specimenNode">
<xsl:with-param name="instanceNode" select="."/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<!-- missing element in the input file -->
<xsl:apply-templates select="$specimenNode">
<xsl:with-param name="instanceNode" select="''"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
<!-- text nodes -->
<!-- (working hypotesis: no mixed content) -->
<xsl:value-of select="$instanceNode/text()"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<!-- the node is missing in the input document -->
<xsl:copy>
<xsl:apply-templates select="* | @*">
<xsl:with-param name="instanceNode" select="''"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Resulting output:
<RootElement>
<XMLTagOne>
<value1 property=""/>
</XMLTagOne>
<RepeatableElementOne attr1="lorem ipsum">
<value2>bb</value2>
<RepeatableElemenTwo>
<value3>cc</value3>
</RepeatableElemenTwo>
<RepeatableElemenTwo>
<value3>dd</value3>
</RepeatableElemenTwo>
</RepeatableElementOne>
<RepeatableElementOne attr1="">
<value2>ee</value2>
<RepeatableElemenTwo>
<value3/>
</RepeatableElemenTwo>
</RepeatableElementOne>
</RootElement>
Notable points:
- the templates in the stylesheet are applied to the nodes in the specimen XML, as they need to be executed for the elements missing in the input file
- checking attributes is the easy part, as they are either present or missing in the input XML; we create them using the specimen XML and populate with the value of the input XML, if it exists
- checking elements is a bit more tricky, as they could be repeated (so for each element in the specimen XML we must loop over the corresponding elements in the input XML)
- I worked under the hypothesis that there are no mixed contents, so each element either contains text nodes or other elements; this allows us to just copy text nodes found in the input file
As to how to get the specimen XML:
- in the most general situation, an XML Schema can define a nested structure with unlimited nesting level (for example, nested
<div>
elements in xhtml) - schemas can contain choices (element
<A>
can either contain<B1>
or<b2>
), which would make the creation of the appropriate specimen XML extremely difficult, if not altogether impossible - if we limit ourselves to XML Schemas with no recursive types and no
xs:choice
orxs:all
(onlyxs:sequence
), which I believe to be sensible hypothesis, then an XSLT transformation to produce an XML with all the possible attributes and elements should be pretty straighforward
回答2:
Essentially the task you are setting yourself is to write a schema processor that not only validates source documents (in the way that any schema processor does), but also repairs them if they are invalid. I don't think you have any idea of the magnitude of this task. A general solution would involve building a finite state machine corresponding to the grammar defined by each complex type definition in the schema, validating instances against using this finite state machine, and then invoking the repair functionality when an element is found that does not match any path in the FSM. If you restrict yourself to supplying missing elements, then it's not actually too hard to detect that when you encounter element E in state S, and there is no transition on E from S, then there might be an element F that does have a transition from S, which leads you to a state S2 that has a transition on E. But even if you find that, you still have a couple more challenges: you might have to choose between F, G, and H which would all be possible "repair" elements; you might find that you need to insert more than one element to make a repair (which starts to involve some complex graph-searching); and once you have found an element that you want to insert, you have to construct an instance of that element whose content is valid.
It would make a good PhD project.
回答3:
IMHO, your best course of action is to insert (manually) the required elements into your XSLT stylesheet, for example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/RootElement">
<xsl:copy>
<XMLTagOne>
<value1>
<xsl:value-of select="XMLTagOne/value1"/>
</value1>
</XMLTagOne>
<xsl:apply-templates select="RepeatableElementOne" />
</xsl:copy>
</xsl:template>
<xsl:template match="RepeatableElementOne">
<xsl:copy>
<value2>
<xsl:value-of select="value2"/>
</value2>
<xsl:choose>
<xsl:when test="RepeatableElemenTwo">
<xsl:apply-templates select="RepeatableElemenTwo" />
</xsl:when>
<xsl:otherwise>
<RepeatableElemenTwo>
<value3/>
</RepeatableElemenTwo>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
来源:https://stackoverflow.com/questions/29727539/xsl-t-adding-missing-schema-elements-from-input-xml-as-empty-tags