Replacing strings in various XML files

老子叫甜甜 提交于 2019-11-29 16:00:51

If both XQuery and XSLT are an option, you're probably using an XSLT 2.0 processor. If so, this should work:

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="search" select="'Bird'"/>
    <xsl:param name="replace" select="'Dog'"/>

    <xsl:template match="@*|*|comment()|processing-instruction()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:analyze-string select="." regex="{$search}">
            <xsl:matching-substring><xsl:value-of select="$replace"/></xsl:matching-substring>
            <xsl:non-matching-substring><xsl:value-of select="."/></xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>

</xsl:stylesheet>

Using the XML input from the question, this XSLT produces the following output:

<something>
   <parent>
      <child>Dog is the word 1.</child>
      <child>Curd is the word 2.</child>
      <child>Nerd is the word 3.</child>
   </parent>
   <parent>
      <child>Dog is the word 4.</child>
      <child>Word is the word 5.</child>
      <child>Dog is the word 6.</child>
   </parent>
</something>

Note: No elements/attributes/comments/processing-instructions would be altered in the creation of the output.


EDIT

The reason you're getting duplicates is because your xsl:for-each is looping over the two word elements. If you had 3, it would output the text 3 times.

You just need to build the regex a little differently:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:param name="list">
        <words>
            <word>
                <search>Bird</search>
                <replace>Dog</replace>
            </word>
            <word>
                <search>word</search>
                <replace>man</replace>
            </word>
        </words>
    </xsl:param>

    <xsl:template match="@*|*|comment()|processing-instruction()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
        <xsl:variable name="search" select="concat('(',string-join($list/words/word/search,'|'),')')"/>
        <xsl:analyze-string select="." regex="{$search}">
            <xsl:matching-substring>
                <xsl:value-of select="$list/words/word[search=current()]/replace"/>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

This will produce:

<something>
   <parent>
      <child>Dog is the man 1.</child>
      <child>Curd is the man 2.</child>
      <child>Nerd is the man 3.</child>
   </parent>
   <parent>
      <child>Dog is the man 4.</child>
      <child>Word is the man 5.</child>
      <child>Dog is the man 6.</child>
   </parent>
</something>

This should do it:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:param name="findText" select="'Bird'" />
  <xsl:param name="replaceText" select="'Dog'" />

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:call-template name="string-replace-all">
      <xsl:with-param name="text" select="." />
      <xsl:with-param name="replace" select="$findText" />
      <xsl:with-param name="by" select="$replaceText" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="string-replace-all">
    <xsl:param name="text" />
    <xsl:param name="replace" />
    <xsl:param name="by" />
    <xsl:choose>
      <xsl:when test="contains($text, $replace)">
        <xsl:value-of select="substring-before($text,$replace)" />
        <xsl:value-of select="$by" />
        <xsl:call-template name="string-replace-all">
          <xsl:with-param name="text"
          select="substring-after($text,$replace)" />
          <xsl:with-param name="replace" select="$replace" />
          <xsl:with-param name="by" select="$by" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Note that I have specified 'Bird' and 'Dog' as default values for the parameters to I can easily demonstrate the result, but it should be possible to pass in values for these parameters from external code. When run on your sample input, this produces:

<something>
  <parent>
    <child>Dog is the word 1.</child>
    <child>Curd is the word 2.</child>
    <child>Nerd is the word 3.</child>
  </parent>
  <parent>
    <child>Dog is the word 4.</child>
    <child>Word is the word 5.</child>
    <child>Dog is the word 6.</child>
  </parent>
</something>

I think the trick is to understand that the document model is different from string parsing. Once you have that, this use-case is easy enough in either XQuery or XSLT. Your own preference will be a matter of taste. Here is a crude approach in XQuery. A more refined solution might use recursive function calls, ala http://docs.marklogic.com/4.1/guide/app-dev/typeswitch

let $in := <something>
  <parent>
    <child>Bird is the word 1.</child>
    <child>Curd is the word 2.</child>
    <child>Nerd is the word 3.</child>
  </parent>
  <parent>
    <child>Bird is the word 4.</child>
    <child>Word is the word 5.</child>
    <child>Bird is the word 6.</child>
  </parent>
</something>
return element { node-name($in) } {
  $in/@*,
  for $n in $in/node()
  return typeswitch($n)
  case element(parent) return element { node-name($n) } {
    for $c in $n/node()
    return typeswitch($c)
    case element(child) return element { node-name($c) } {
      replace($c, 'Bird', 'Dog') }
    default return $c }
  default return $n }

Here's another XQuery option...

declare function local:searchReplace($element as element()) {
  element {node-name($element)}
    {$element/@*,
     for $child in $element/node()
        return 
            if ($child instance of element())
            then
                local:searchReplace($child)
            else 
                replace($child,'Bird','Dog')
    }
};

local:searchReplace(/*)

This also produces the same output as my XSLT 2.0 answer:

<something>
      <parent>
            <child>Dog is the word 1.</child>
            <child>Curd is the word 2.</child>
            <child>Nerd is the word 3.</child>
      </parent>
      <parent>
            <child>Dog is the word 4.</child>
            <child>Word is the word 5.</child>
            <child>Dog is the word 6.</child>
      </parent>
</something>
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!