XSL-T Adding Missing schema elements from input xml as empty tags

社会主义新天地 提交于 2019-12-11 03:06:09

问题


I am relatively new to XSL-T. My requirement is rather simple. I want to add missing elements of Schema which are not present in the xml as empty tags.

For example,

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           attributeFormDefault="unqualified"
           elementFormDefault="qualified">
  <xs:element name="RootElement">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="XMLTagOne">
               <xs:complexType>
                  <xs:sequence>
                     <xs:element type="xs:string" name="value1"/>
                  </xs:sequence>
               </xs:complexType>
            </xs:element>
            <xs:element name="RepeatableElementOne" maxOccurs="unbounded" minOccurs="0">
               <xs:complexType>
                  <xs:sequence>
                     <xs:element type="xs:string" name="value2"/>
                     <xs:element name="RepeatableElemenTwo" maxOccurs="unbounded" minOccurs="0">
                        <xs:complexType>
                           <xs:sequence>
                              <xs:element type="xs:string" name="value3"/>
                           </xs:sequence>
                        </xs:complexType>
                     </xs:element>
                  </xs:sequence>
               </xs:complexType>
            </xs:element>
         </xs:sequence>
      </xs:complexType>
  </xs:element>
</xs:schema>

Consider This i/p:

<RootElement>
    <RepeatableElementOne>
        <value2>bb</value2>
        <RepeatableElemenTwo>
            <value3>cc</value3>
        </RepeatableElemenTwo>
        <RepeatableElemenTwo>
            <value3>dd</value3>
        </RepeatableElemenTwo>
    </RepeatableElementOne>
    <RepeatableElementOne>
        <value2>ee</value2>
    </RepeatableElementOne>
</RootElement>

For this i/p I want the element <XMLTagOne> and <RepeatableElemenTwo> to be added as empty tags.

Expected O/P:

<RootElement>
    <XMLTagOne>               <!-- Added as empty tag though not present in i/p-->
        <value1></value1>
    </XMLTagOne>
    <RepeatableElementOne>
        <value2>bb</value2>
        <RepeatableElemenTwo>
            <value3>cc</value3>
        </RepeatableElemenTwo>
        <RepeatableElemenTwo>
            <value3>dd</value3>
        </RepeatableElemenTwo>
    </RepeatableElementOne>
    <RepeatableElementOne>
        <value2>ee</value2>
        <RepeatableElemenTwo>
            <value3></value3>
        </RepeatableElemenTwo>
    </RepeatableElementOne>
</RootElement>

With Some Initial Research I discovered that I have to traverse through every node with an identity template matching all elements. Can you suggest me how I can approach this problem? Thanks.

EDIT

My Design approach:

  1. Create an intermediate xml document based on the xsd.

Something like this,

<Root>
<a></a>
<b></b>
<Root>
  • Traverse through all the individual nodes. (Identity Template ??)
  • Get the value of each node from the source XML which I have in-hand.

Problems in this approach

  • Repeating Elements.
  • Will have to check if the count is > 1. If so, then use <xsl:for-each> to process the nodes from the source document.

回答1:


Let us suppose you have what you called an intermediate XML document based on the XSD Schema (more on this at the end), from now on specimen XML:

Specimen XML:

<RootElement>
    <XMLTagOne>
        <value1 property=""> </value1>
    </XMLTagOne>
    <RepeatableElementOne attr1="">
        <value2></value2>
        <RepeatableElemenTwo>
            <value3></value3>
        </RepeatableElemenTwo>
    </RepeatableElementOne>
</RootElement>

(I added a couple of attributes to show that the proposed solution works for them too)

Input XML:

<RootElement>
    <RepeatableElementOne attr1="lorem ipsum">
        <value2>bb</value2>
        <RepeatableElemenTwo>
            <value3>cc</value3>
        </RepeatableElemenTwo>
        <RepeatableElemenTwo>
            <value3>dd</value3>
        </RepeatableElemenTwo>
    </RepeatableElementOne>
    <RepeatableElementOne>
        <value2>ee</value2>
    </RepeatableElementOne>
</RootElement>

(as per input in the OP, with an added attribute)

XSLT 1.0:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <!-- 
            some node in the input document could be missing,
            so we must apply the templates to the specimen document nodes
        -->
        <xsl:apply-templates select="document('specimen.xml')/*">
            <xsl:with-param name="instanceNode" select="*"/>
        </xsl:apply-templates>
    </xsl:template>

    <xsl:template match="@*">
        <xsl:param name="instanceNode"/>
        <xsl:attribute name="{name(.)}">
            <xsl:value-of select="$instanceNode"/>
        </xsl:attribute>
    </xsl:template>

    <xsl:template match="*">
        <xsl:param name="instanceNode"/>
        <xsl:choose>
            <xsl:when test="$instanceNode">
                <!-- the node is present in the input document -->
                <xsl:copy>
                    <!-- attributes -->
                    <xsl:for-each select="@*">
                        <xsl:apply-templates select=".">
                            <xsl:with-param name="instanceNode" select="$instanceNode/@*[name() = name(current())]"/>
                        </xsl:apply-templates>
                    </xsl:for-each>
                    <!-- elements -->
                    <xsl:for-each select="*">
                        <xsl:variable name="specimenNode" select="."/>
                        <xsl:variable name="instanceNodes" select="$instanceNode/*[name() = name(current())]"/>
                        <xsl:choose>
                            <xsl:when test="$instanceNodes">
                                <!-- one or more elements in the input file -->
                                <xsl:for-each select="$instanceNodes">
                                    <xsl:apply-templates select="$specimenNode">
                                        <xsl:with-param name="instanceNode" select="."/>
                                    </xsl:apply-templates>
                                </xsl:for-each>
                            </xsl:when>
                            <xsl:otherwise>
                                <!-- missing element in the input file -->
                                <xsl:apply-templates select="$specimenNode">
                                    <xsl:with-param name="instanceNode" select="''"/>
                                </xsl:apply-templates>
                            </xsl:otherwise>
                        </xsl:choose>
                    </xsl:for-each>
                    <!-- text nodes -->
                    <!-- (working hypotesis: no mixed content) -->
                    <xsl:value-of select="$instanceNode/text()"/>
                </xsl:copy>
            </xsl:when>
            <xsl:otherwise>
                <!-- the node is missing in the input document -->
                <xsl:copy>
                    <xsl:apply-templates select="* | @*">
                        <xsl:with-param name="instanceNode" select="''"/>
                    </xsl:apply-templates>
                </xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

Resulting output:

<RootElement>
   <XMLTagOne>
      <value1 property=""/>
   </XMLTagOne>
   <RepeatableElementOne attr1="lorem ipsum">
      <value2>bb</value2>
      <RepeatableElemenTwo>
         <value3>cc</value3>
      </RepeatableElemenTwo>
      <RepeatableElemenTwo>
         <value3>dd</value3>
      </RepeatableElemenTwo>
   </RepeatableElementOne>
   <RepeatableElementOne attr1="">
      <value2>ee</value2>
      <RepeatableElemenTwo>
         <value3/>
      </RepeatableElemenTwo>
   </RepeatableElementOne>
</RootElement>

Notable points:

  • the templates in the stylesheet are applied to the nodes in the specimen XML, as they need to be executed for the elements missing in the input file
  • checking attributes is the easy part, as they are either present or missing in the input XML; we create them using the specimen XML and populate with the value of the input XML, if it exists
  • checking elements is a bit more tricky, as they could be repeated (so for each element in the specimen XML we must loop over the corresponding elements in the input XML)
  • I worked under the hypothesis that there are no mixed contents, so each element either contains text nodes or other elements; this allows us to just copy text nodes found in the input file

As to how to get the specimen XML:

  • in the most general situation, an XML Schema can define a nested structure with unlimited nesting level (for example, nested <div> elements in xhtml)
  • schemas can contain choices (element <A> can either contain <B1> or <b2>), which would make the creation of the appropriate specimen XML extremely difficult, if not altogether impossible
  • if we limit ourselves to XML Schemas with no recursive types and no xs:choice or xs:all (only xs:sequence), which I believe to be sensible hypothesis, then an XSLT transformation to produce an XML with all the possible attributes and elements should be pretty straighforward



回答2:


Essentially the task you are setting yourself is to write a schema processor that not only validates source documents (in the way that any schema processor does), but also repairs them if they are invalid. I don't think you have any idea of the magnitude of this task. A general solution would involve building a finite state machine corresponding to the grammar defined by each complex type definition in the schema, validating instances against using this finite state machine, and then invoking the repair functionality when an element is found that does not match any path in the FSM. If you restrict yourself to supplying missing elements, then it's not actually too hard to detect that when you encounter element E in state S, and there is no transition on E from S, then there might be an element F that does have a transition from S, which leads you to a state S2 that has a transition on E. But even if you find that, you still have a couple more challenges: you might have to choose between F, G, and H which would all be possible "repair" elements; you might find that you need to insert more than one element to make a repair (which starts to involve some complex graph-searching); and once you have found an element that you want to insert, you have to construct an instance of that element whose content is valid.

It would make a good PhD project.




回答3:


IMHO, your best course of action is to insert (manually) the required elements into your XSLT stylesheet, for example:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/RootElement">
    <xsl:copy>
        <XMLTagOne>
            <value1>
                <xsl:value-of select="XMLTagOne/value1"/>
            </value1>
        </XMLTagOne>
        <xsl:apply-templates select="RepeatableElementOne" />
    </xsl:copy>
</xsl:template>

<xsl:template match="RepeatableElementOne">
    <xsl:copy>
        <value2>
            <xsl:value-of select="value2"/>
        </value2>
        <xsl:choose>
            <xsl:when test="RepeatableElemenTwo">
                <xsl:apply-templates select="RepeatableElemenTwo" />
            </xsl:when>
            <xsl:otherwise>
                <RepeatableElemenTwo>
                    <value3/>
                </RepeatableElemenTwo>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>


来源:https://stackoverflow.com/questions/29727539/xsl-t-adding-missing-schema-elements-from-input-xml-as-empty-tags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!