Iterate and apply XSLT identity transformation to directory of documents?

三世轮回 提交于 2020-01-06 02:44:32

问题


I have a folder with HTML files which look more or less like this:

<div id="d10e3019" class="content">
   <h1>header</h1>
   <div class="adv">
      <div class="warn">
         <img width="60px" height="20px" src="img/warn.png" alt="WARNING"></img>
         <p class="cause">gfgfg!</p>
         <p>⇒  Thgfh</p>
         <p class="step">⇔ 
            <span class="emphasis">hgfh
         </p>
      </div>
   </div>
</div>

They all have <div id="someId" class="content"> as root element and then just various HTML markup.

I need to change all of the src attributes in the img tags of each document to look like this:

<img width="60px" height="20px" src="http://server.com/{$nameOfTheCurtentFolder}/img/warn.png" alt="WARNING"></img>

and wrap the new output in another div with a new child element. The rest of the document needs to be exactly the same.

I tried this stylesheet, but with this stylesheet only the text nodes get written in the output document (while generating the new div element works):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:fn='http://www.w3.org/2005/xpath-functions' exclude-result-prefixes='xsl xs fn' xmlns:h="http://java.sun.com/jsf/html">
    <xsl:output method="xml" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="files" select="collection('./output?select=*.html')"/>

    <xsl:template match="/">
        <xsl:for-each select="$files">
            <xsl:variable name="fileName" select="tokenize(base-uri(), '/')[last()]"/>
            <xsl:result-document method="xhtml" href="new/{$fileName}">
                <div>
                    <h:selectBooleanCheckbox value="pubs"/>
                    <xsl:copy>
                        <xsl:apply-templates select="@* | node()"/>
                    </xsl:copy>
                </div>
            </xsl:result-document>
        </xsl:for-each>
    </xsl:template>
    <xsl:template match="@src">
        <xsl:variable name="nameOfTheCurtentFolder" select="tokenize(base-uri(), '/')[last()-2]"/>
        <xsl:text>http://server.com/</xsl:text>
        <xsl:value-of select="$nameOfTheCurtentFolder"/>
        <xsl:text>/output/</xsl:text>
        <xsl:value-of select="."/>
    </xsl:template>
</xsl:stylesheet>

The output looks like this:

              <div>
                 <h:selectBooleanCheckbox value="pubs"/>
                  headergfgfg!⇒  Thgfhhgfh
              </div>

This is a follow-up to my earlier question, Change attribute value without creating new output document?


回答1:


It looks like your stylesheet is missing the identity transform template:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

In addition, in order to change the value of an attribute, you need to recreate it first:

<xsl:template match="@src">
    <xsl:attribute name="src">
        <xsl:variable name="nameOfTheCurtentFolder" select="tokenize(base-uri(), '/')[last()-2]"/>
        <xsl:text>http://server.com/</xsl:text>
        <xsl:value-of select="$nameOfTheCurtentFolder"/>
        <xsl:text>/output/</xsl:text>
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

Finally, the HTML document to be processed must also be a well-formed XML; the provided example is not.




回答2:


Got it to work, too, only 5 minutes too late (compared with michael.hor257k):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:xs='http://www.w3.org/2001/XMLSchema' xmlns:fn='http://www.w3.org/2005/xpath-functions' exclude-result-prefixes='xsl xs fn' xmlns:h="http://java.sun.com/jsf/html">
    <xsl:output method="xml" encoding="utf-8"/>
    <xsl:strip-space elements="*"/>
    <xsl:param name="files" select="collection('./output/*.xml')"/>
    <xsl:template match="/">
        <xsl:for-each select="$files">
            <xsl:variable name="fileName" select="tokenize(base-uri(), '/')[last()]"/>
            <xsl:result-document method="xhtml" href="new/{$fileName}">
                <div>
                    <h:selectBooleanCheckbox value="pubs"/>
                    <xsl:copy>
                        <xsl:apply-templates/>
                    </xsl:copy>
                </div>
            </xsl:result-document>
        </xsl:for-each>
    </xsl:template>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="@src">
        <xsl:attribute name="src">
            <xsl:variable name="nameOfTheCurtentFolder" select="tokenize(base-uri(), '/')[last()-2]"/>
            <xsl:text>http://server.com/</xsl:text>
            <xsl:value-of select="$nameOfTheCurtentFolder"/>
            <xsl:text>/output/</xsl:text>
            <xsl:value-of select="."/>
        </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>



回答3:


General Solution

XSLT has no standard way to iterate over directories of files. You are expected to apply XSLT via external control to achieve such an effect. A Saxon extension to the collection() function, however, which can do this...

Iterating over input documents and applying identify transformation

The following XSLT will apply an adjusted identity transformation to all `$inSubdirName` HTML files and place the results in `$outSubdirName`:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
    <xsl:output omit-xml-declaration="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="inSubdirName" select="'in'"/>
  <xsl:param name="outSubdirName" select="'out'"/>

  <!-- Driver for each file in inSubdirName -->
  <xsl:template match="/">
    <xsl:for-each select="collection(concat($inSubdirName, '/?select=*.html'))">
      <xsl:variable name="inFileName" select="base-uri()"/>
      <xsl:variable name="outFileName"
                    select="concat($outSubdirName, '/',
                            tokenize(base-uri(), '/')[last()])"/>
      <xsl:message select="concat('Transforming from ',
                           $inFileName, ' to ', $outFileName)"/>
      <xsl:result-document method="xhtml" href="{$outFileName}">
        <xsl:apply-templates select="@*|node()"/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

  <!-- Identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Identity transform overrides -->
  <xsl:template match="/div">
    <div xmlns:h="http://java.sun.com/jsf/html">
      <h:selectBooleanCheckbox value="pubs"/>
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </div>
  </xsl:template>

  <xsl:template match="@src">
    <xsl:attribute name="src">
      <xsl:variable name="nameOfTheCurrentFolder"
                    select="tokenize(base-uri(), '/')[last()-2]"/>
      <xsl:text>http://server.com/</xsl:text>
      <xsl:value-of select="$nameOfTheCurrentFolder"/>
      <xsl:text>/output/</xsl:text>
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>

</xsl:stylesheet>

When applied to input HTML files such as you provided, corrected to be well-formed:

<div id="d10e3019" class="content">
   <h1>header 1</h1>
   <div class="adv">
      <div class="warn">
         <img width="60px" height="20px" src="img/warn.png" alt="WARNING"></img>
         <p class="cause">gfgfg!</p>
         <p>⇒  Thgfh</p>
         <p class="step">⇔ 
            <span class="emphasis">hgfh</span>
         </p>
      </div>
   </div>
</div>

Will transform them with the new @src attribute and the new div and h:selectBooleanCheckbox child element,

<div xmlns:h="http://java.sun.com/jsf/html">
   <h:selectBooleanCheckbox value="pubs"></h:selectBooleanCheckbox>
   <div id="d10e3019" class="content">
      <h1>header 1</h1>
      <div class="adv">
         <div class="warn">
            <img width="60px" height="20px" src="http://server.com/xslt/output/img/warn.png" alt="WARNING"></img>
            <p class="cause">gfgfg!</p>
            <p>⇒  Thgfh</p>
            <p class="step">⇔ 

               <span class="emphasis">hgfh</span>
            </p>
         </div>
      </div>
   </div>
</div>

as requested, with a declaration added for the h namespace prefix to ensure that the output is well-formed.

Note also these improvements to your original XSLT:

  • Input and output directories are parameterized.
  • The driver XSLT that does iteration is fully isolated from the identity transformation XSLT templates which are responsible for applying changes.


来源:https://stackoverflow.com/questions/34810899/iterate-and-apply-xslt-identity-transformation-to-directory-of-documents

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!