问题
I have the following xml:
<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
And I want to show the first 200 characters only, but it may not cut off in the middle of a word, and I want to keep the formatting elements. So above fragment after transformation becomes:
<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ...</p>
Does anyone know if this is possible? Thanks in advance!
回答1:
This XSLT 2.0 transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pmaxChars" as="xs:integer" select="200"/>
<xsl:variable name="vPass1">
<xsl:apply-templates select="/*"/>
</xsl:variable>
<xsl:template match="node()|@*" mode="#default pass2">
<xsl:copy>
<xsl:apply-templates select="node()|@*" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="$vPass1" mode="pass2"/>
</xsl:template>
<xsl:template match=
"text()[sum(preceding::text()/string-length()) ge $pmaxChars]"/>
<xsl:template match="text()[not(following::text())]" mode="pass2">
<xsl:variable name="vPrecedingLength"
select="sum(preceding::text()/string-length())"/>
<xsl:variable name="vRemaininingLength"
select="$pmaxChars -$vPrecedingLength"/>
<xsl:sequence select=
"replace(.,
concat('(^.{0,', $vRemaininingLength, '})\W.*'),
'$1'
)
"/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
produces the wanted, correct result (an XML document in which the total length of all text nodes doesn't exceed 200, the truncation is performed on a word boundary, and this is the truncation with the maximum possible total string-length remaining):
<p>Lorem ipsum dolor sit amet, <b>consectetur adipisicing</b> elit, <i>sed do<sup>2</sup></i> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut</p>
Explanation:
This is a generic solution that accepts the maximum number of text characters as a global/external parameter
$pmaxChars
.This is a two-pass solution. In pass1 the identity rule is overriden by a template that deletes all text nodes, whose starting character has an index (in the total concatenation of all text nodes), bigger than the maximum number of allowed characters. Thus, the result of pass1 is an XML document in which the "break" on maximum allowed length occurs in the last text node.
In pass 2 we override the identity rule with a template that matches the last text node. We use the replace() function:
....
replace(.,
concat('(^.{0,', $vRemaininingLength, '})\W.*'),
'$1'
)
this causes the complete string to be matched and to be replaced by the subexpression between the brackets. This subexpression is dynamically constructed and matches the longest substring starting at the start of the string and containing from 0 to $vRemaininingLength
(the maximum allowed length minus the total length of all preceding text nodes) characters, and that is immediately followed by a word-boundary character.
UPDATE:
To get rid of resulting elements that due to the trimming have no text node descendents (are "empty"), simply add this additional template:
<xsl:template match=
"*[(.//text())[1][sum(preceding::text()/string-length()) ge $pmaxChars]]"/>
来源:https://stackoverflow.com/questions/10585133/trim-mixed-content-to-max-number-of-characters-with-xslt