问题
I have a pipe delimited text file as shown below, which I need to transform into a well formed xml structure (example shown below) using xsl. The xsl below is my (latest) attempt at solving this - however I cannot seem to find a way to encapsulate the level 002 elements in level 001, i.e. maintain the parent-child relationship, when iterating through the file line by line. Could anyone help here ?
Pipe delimited file - input
001|XXX|YYY
002|AAA|BBB
002|CCC|DD
001|EEF|XXX
002|HHH|GGG
XML File - desired output
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GG</elem>
</level002>
</level001>
</root>
Current XSL
<xsl:variable name="Cols">
<col>field1,1</col>
<col>field2,2</col>
<col>field3,3</col>
</xsl:variable>
<xsl:template match="/" name="main">
<xsl:choose>
<xsl:when test="unparsed-text-available($pathToCSV, $encoding)">
<xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />
<xsl:variable name="lines" select="tokenize($csv, '\n')" as="xs:string+" />
<root>
<xsl:for-each select="$lines[position() > 0]">
<xsl:if test="translate(., '  	 ', '') != ''">
<level001>
<xsl:variable name="line" select="." />
<xsl:variable name="columns" select="tokenize(.,'\|')" as="xs:string+"/>
<xsl:choose>
<xsl:when test="$columns[1]='001'">
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</xsl:when>
<xsl:when test="$columns[1]='002'">
<level002>
<xsl:for-each select="$Cols/col">
<xsl:variable name="column" select="number(substring-after(.,','))"/>
<elem name="{substring-before(.,',')}">
<!-- trims the whitespace from the beginning and the ending of the value -->
<xsl:value-of select="replace(replace($columns[$column],'\s+$',''),'^\s+','')"/>
</elem>
</xsl:for-each>
</level002>
</xsl:when>
</xsl:choose>
</level001>
</xsl:if>
</xsl:for-each>
</root>
</xsl:when>
</xsl:choose>
回答1:
I would first transform the flat text into a flat XML structure and then group that with for-each-group group-starting-with
, as in the following code sample:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="mf xs"
version="2.0">
<xsl:param name="text-url" as="xs:string" select="'test2012090401.txt'"/>
<xsl:param name="sep" as="xs:string" select="'\|'"/>
<xsl:param name="field" as="xs:string" select="'field'"/>
<xsl:output indent="yes"/>
<xsl:function name="mf:group" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:param name="level" as="xs:integer"/>
<xsl:for-each-group select="$nodes" group-starting-with="line[xs:integer(elem[1]) eq $level]">
<xsl:element name="level{*[1]}">
<xsl:copy-of select="*"/>
<xsl:sequence select="mf:group(current-group() except ., $level + 1)"/>
</xsl:element>
</xsl:for-each-group>
</xsl:function>
<xsl:template name="main">
<xsl:variable name="flat">
<xsl:for-each select="tokenize(unparsed-text($text-url), '\r?\n')">
<line>
<xsl:for-each select="tokenize(., $sep)">
<elem name="{$field}{position()}">
<xsl:value-of select="."/>
</elem>
</xsl:for-each>
</line>
</xsl:for-each>
</xsl:variable>
<root>
<xsl:sequence select="mf:group($flat/line, 1)"/>
</root>
</xsl:template>
</xsl:stylesheet>
When I apply that stylesheet with Saxon 9 using java -jar saxon9he.jar -it:main -xsl:sheet.xsl
, the result I get is
<?xml version="1.0" encoding="UTF-8"?>
<root>
<level001>
<elem name="field1">001</elem>
<elem name="field2">XXX</elem>
<elem name="field3">YYY</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">AAA</elem>
<elem name="field3">BBB</elem>
</level002>
<level002>
<elem name="field1">002</elem>
<elem name="field2">CCC</elem>
<elem name="field3">DD</elem>
</level002>
</level001>
<level001>
<elem name="field1">001</elem>
<elem name="field2">EEF</elem>
<elem name="field3">XXX</elem>
<level002>
<elem name="field1">002</elem>
<elem name="field2">HHH</elem>
<elem name="field3">GGG</elem>
<level/>
</level002>
</level001>
</root>
The stylesheet has a parameter named text-url
to the plain text file you can set when running the stylesheet.
回答2:
You can find a solution to essentially the same problem here:
http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml
The core is a recursive grouping template:
<xsl:template name="process-level">
<xsl:param name="population" required="yes" as="element()*"/>
<xsl:param name="level" required="yes" as="xs:integer"/>
<xsl:for-each-group select="$population"
group-starting-with="*[xs:integer(@level) eq $level]">
<xsl:element name="{@tag}">
<xsl:copy-of select="@ID[string(.)], @REF[string(.)]"/>
<xsl:value-of select="normalize-space(@text)"/>
<xsl:call-template name="process-level">
<xsl:with-param name="population"
select="current-group()[position() != 1]"/>
<xsl:with-param name="level"
select="$level + 1"/>
</xsl:call-template>
</xsl:element>
</xsl:for-each-group>
</xsl:template>
回答3:
Well, you're iterating over every line and already closing the level001
tag when finished with the line. Why not try something like (pseudo-code):
- for each line
- if line is level001
- print
<level001>
- get index of next level001
- for each level002 between this line and the next level001 line
- print
<level002>
- print body of level002
- print
</level002>
- print
</level001>
来源:https://stackoverflow.com/questions/12259961/xsl-create-well-formed-xml-from-text-file