convert character if codepoint within given range

天涯浪子 提交于 2019-12-12 14:27:55

问题


I have a couple of XML files that contain unicode characters with codepoint values between 57600 and 58607. Currently these are shown in my content as square blocks and I'd like to convert these to elements.

So what I'd like to achieve is something like :

<!-- current input -->
<p> Follow the on-screen instructions.</p>  
<!-- desired output-->
<p><unichar value="58208"/> Follow the on-screen instructions.</p>
<!-- Where 58208 is the actual codepoint of the unicode character in question -->

I've fooled around a bit with tokenizer but as you need to have reference to split upon this turned out to be over complicated.

Any advice on how to tackle this best ? I've been trying some things like below but got struck (don't mind the syntax, I know it doesn't make any sense)

<xsl:template match="text()">
 -> for every character in my string
    -> if string-to-codepoints(current character) greater then 57600 return <unichar value="codepoint value"/>
       else return character
</xsl:template>

回答1:


It sounds like a job for analyze-string e.g.

<xsl:template match="text()">
  <xsl:analyze-string select="." regex="[&#57600;-&#58607;]">
    <xsl:matching-substring>
       <unichar value="{string-to-codepoints(.)}"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

Untested.




回答2:


This transformation:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="/*">
     <p>
      <xsl:for-each select="string-to-codepoints(.)">
        <xsl:choose>
            <xsl:when test=". > 57600">
              <unichar value="{.}"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="codepoints-to-string(.)"/>
            </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
     </p>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<p> Follow the on-screen instructions.</p>

produces the wanted, correct result:

<p><unichar value="58498"/> Follow the on-screen instructions.</p>

Explanation: Proper use of the standard XPath 2.0 functions string-to-codepoints() and codepoints-to-string().



来源:https://stackoverflow.com/questions/10798974/convert-character-if-codepoint-within-given-range

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!