How do I prevent duplicates, in XSL?

余生长醉 提交于 2019-11-30 20:47:10

Try the following code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
  <xsl:output indent="yes"></xsl:output>

<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*"/>
  </xsl:copy>
</xsl:template>

  <xsl:template match="c03/did">
    <xsl:choose>
      <xsl:when test="not(container)">
        <did>
          <!-- If no c03 container item is found, look in the c04 level for one -->
          <xsl:if test="../c04/did/container">
            <xsl:variable name="foo" select="../c04/did/container[@type='Box']/text()"/>
            <!-- If a c04 container item is found, use the info to build a c03 version -->
            <!-- Skip c03 container item, if still no c04 items found -->
            <container label="Box" type="Box">

              <!-- Build container list -->
              <!-- Test for more than one item, and if so, list them, -->
              <!-- separated by commas and a space -->
              <xsl:for-each select="distinct-values($foo)">
                <xsl:sort />
                <xsl:if test="position() &gt; 1">, </xsl:if>
                <xsl:value-of select="." />
              </xsl:for-each>
            </container>
            <xsl:apply-templates select="*" />
          </xsl:if>
        </did>
      </xsl:when>

      <!-- If there is a c03 container item(s), list it normally -->
      <xsl:otherwise>
        <xsl:copy-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

It looks pretty much as the output you want:

<?xml version="1.0" encoding="UTF-8"?>
<c03 id="ref6488" level="file">
  <did>
      <container label="Box" type="Box">154, 156</container>
      <unittitle>Clinic Building</unittitle>
      <unitdate era="ce" calendar="gregorian">1947</unitdate>
   </did>
  <c04 id="ref34582" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <container label="Folder" type="Folder">3</container>
      </did>
  </c04>
  <c04 id="ref6540" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <unittitle>Contact prints</unittitle>
      </did>
  </c04>
  <c04 id="ref6606" level="file">
      <did>
         <container label="Box" type="Box">154</container>
         <unittitle>Negatives</unittitle>
      </did>
  </c04>
</c03>

The trick is to use <xsl:sort> and distinct-values() together. See the (IMHO) great book from Michael Key "XSLT 2.0 and XPATH 2.0"

There is no need for an XSLT 2.0 solution for this problem.

Here is an XSLT 1.0 solution, which is more compact than the currently selected XSLT 2.0 solution (35 lines vs. 43 lines):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:key name="kBoxContainerByVal"
     match="container[@type='Box']" use="."/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="c03/did[not(container)]">
   <xsl:copy>

   <xsl:variable name="vContDistinctValues" select=
    "/*/*/*/container[@type='Box']
            [generate-id()
            =
             generate-id(key('kBoxContainerByVal', .)[1])
            ]
            "/>

    <container label="Box" type="Box">
      <xsl:for-each select="$vContDistinctValues">
        <xsl:sort data-type="number"/>

        <xsl:value-of select=
        "concat(., substring(', ', 1 + 2*(position() = last())))"/>
      </xsl:for-each>
    </container>
    <xsl:apply-templates/>
   </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the originally provided XML document, the correct, wanted result is produced:

<c03 id="ref6488" level="file">
   <did>
      <container label="Box" type="Box">156, 154</container>
      <unittitle>Clinic Building</unittitle>
      <unitdate era="ce" calendar="gregorian">1947</unitdate>
   </did>
   <c04 id="ref34582" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <container label="Folder" type="Folder">3</container>
      </did>
   </c04>
   <c04 id="ref6540" level="file">
      <did>
         <container label="Box" type="Box">156</container>
         <unittitle>Contact prints</unittitle>
      </did>
   </c04>
   <c04 id="ref6606" level="file">
      <did>
         <container label="Box" type="Box">154</container>
         <unittitle>Negatives</unittitle>
      </did>
   </c04>
</c03>

Update:

I didn't notice the requirement that the container numbers must appear sorted. Now the solution reflects this.

try using a Key group in xslt, here's an article on the Muenchian method which should help to eliminate duplicates. http://www.jenitennison.com/xslt/grouping/muenchian.html

A slightly shorter XSLT 2.0 version, combining approaches from other answers. Note that sorting is alphabetical, so that if the labels "54" and "156" are found, the output will be "156, 54". If a numerical sort is needed, use <xsl:sort select="number(.)"/> instead of <xsl:sort/>.

<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/> 
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*"> 
        <xsl:copy> 
            <xsl:apply-templates select="node()|@*"/> 
        </xsl:copy> 
    </xsl:template> 

    <xsl:template match="c03/did[not(container)]">
        <xsl:variable name="containers" 
                      select="../c04/did/container[@label='Box'][text()]"/>
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:if test="$containers">
                <container label="Box" type="Box">
                    <xsl:for-each select="distinct-values($containers)">
                        <xsl:sort/>
                        <xsl:if test="position() != 1">, </xsl:if>
                        <xsl:value-of select="."/>
                    </xsl:for-each>
                </container> 
            </xsl:if>
            <xsl:apply-templates select="node()"/> 
        </xsl:copy> 
    </xsl:template> 
</xsl:stylesheet>

A truly XSLT 2.0 solution, also quite short:

<xsl:stylesheet  version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
>
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="c03/did[not(container)]">
    <xsl:copy>
      <xsl:copy-of select="@*"/>

      <xsl:variable name="vContDistinctValues" as="xs:integer*">
        <xsl:perform-sort select=
          "distinct-values(/*/*/*/container[@type='Box']/text()/xs:integer(.))">
          <xsl:sort/>
        </xsl:perform-sort>
      </xsl:variable>

      <xsl:if test="$vContDistinctValues">
        <container label="Box" type="Box">
          <xsl:value-of select="$vContDistinctValues" separator=","/>
        </container>
      </xsl:if>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Do note:

  1. The use of types avoids the need to specify the data-type in <xsl:sort/> .

  2. The use of the separator attribute of <xsl:value-of/>

The following XSLT 1.0 transformation does what you are looking for

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> 
  <xsl:output encoding="utf-8" />

  <!-- key to index containers by these three distinct qualities: 
       1: their ancestor <c??> node (represented as its unique ID)
       2: their @type attribute value
       3: their node value (i.e. their text) -->
  <xsl:key 
    name  = "kContainer" 
    match = "container"
    use   = "concat(generate-id(../../..), '|', @type, '|', .)"
  />

  <!-- identity template to copy everything as is by default -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <!-- special template for <did>s without a <container> child -->
  <xsl:template match="did[not(container)]">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <container label="Box" type="Box">
        <!-- from subordinate <container>s of type Box, use the ones
             that are *the first* to have that certain combination 
             of the three distinct qualities mentioned above -->
        <xsl:apply-templates mode="list-values" select="
          ../*/did/container[@type='Box'][
            generate-id()
            =
            generate-id(
              key(
                'kContainer', 
                concat(generate-id(../../..), '|', @type, '|', .)
              )[1]
            )
          ]
        ">
          <!-- sort them by their node value -->
          <xsl:sort select="." data-type="number" />
        </xsl:apply-templates>
      </container>
      <xsl:apply-templates select="node()" />
    </xsl:copy>
  </xsl:template>

  <!-- generic template to make list of values from any node-set -->
  <xsl:template match="*" mode="list-values">
    <xsl:value-of select="." />
    <xsl:if test="position() &lt; last()">
      <xsl:text>, </xsl:text>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Returns

<c03 id="ref6488" level="file">
  <did>
    <container label="Box" type="Box">154, 156</container>
    <unittitle>Clinic Building</unittitle>
    <unitdate era="ce" calendar="gregorian">1947</unitdate>
  </did>
  <c04 id="ref34582" level="file">
    <did>
      <container label="Box" type="Box">156</container>
      <container label="Folder" type="Folder">3</container>
    </did>
  </c04>
  <c04 id="ref6540" level="file">
    <did>
      <container label="Box" type="Box">156</container>
      <unittitle>Contact prints</unittitle>
    </did>
  </c04>
  <c04 id="ref6606" level="file">
    <did>
      <container label="Box" type="Box">154</container>
      <unittitle>Negatives</unittitle>
    </did>
  </c04>
</c03>

The generate-id() = generate-id(key(...)[1]) part is what's called Muenchian grouping. Unless you can use XSLT 2.0, this is the way to go.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!