XSLT- normalize non-breaking whitespace characters

后端 未结 1 613
长情又很酷
长情又很酷 2021-01-20 18:09

I have a sample xml file like this,


    

text1 text2

text1 text2

text1 text2   

1条回答
  •  没有蜡笔的小新
    2021-01-20 18:23

    You could do:

    
    

    This will work in XSLT 1.0 and 2.0 alike.


    In XSLT 2.0, you could also use regex - for example:

    
    

    will remove the horizontal tab character as well as any character in the Unicode Space_Separator category, which includes not only the space and non-breaking space characters but also other space characters. Documentation is hard to find, but I believe this is currently the complete list: (extracted from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):

      SPACE
      NO-BREAK SPACE
      OGHAM SPACE MARK
      EN QUAD
      EM QUAD
      EN SPACE
      EM SPACE
      THREE-PER-EM SPACE
      FOUR-PER-EM SPACE
      SIX-PER-EM SPACE
      FIGURE SPACE
      PUNCTUATION SPACE
      THIN SPACE
      HAIR SPACE
      NARROW NO-BREAK SPACE
      MEDIUM MATHEMATICAL SPACE
      IDEOGRAPHIC SPACE
    
    𐲰 OLD HUNGARIAN CAPITAL LETTER EZS
    𐳰 OLD HUNGARIAN SMALL LETTER EZS
    𖼶 MIAO LETTER ZSHA
    𖼼 MIAO LETTER ZSA
    𖼾 MIAO LETTER ZZSA
    𖽁 MIAO LETTER ZZSYA
    

    However, testing with Saxon 9.5 shows that the last 6 characters are not recognized: http://xsltransform.net/ncntCSo

    0 讨论(0)
提交回复
热议问题