Is xslt good approach to convert text to xml structure?

試著忘記壹切 提交于 2019-11-30 10:21:18

The statements that: (XSLT) "isn't suitable for transforming from structured text to XML. " and the statement "XSLTmusthave XML as the input document" **are both wrong.

I am thinking 2 approaches

  1. Define a business entity and fill the entity properties by using substring functions on the input text and then serialize the entity to xml

  2. Predefine the xml structure, use xslt to navigate to each node and fill the values by using substring functions on the input text.

In fact, Approach 2 is quite easy to accomplish with XSLT:

I. XSLT 1.0:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*/text()" name="processLines">
  <xsl:param name="pText" select="."/>

  <xsl:if test="contains($pText, '&#xA;')">
    <xsl:variable name="vLine" select=
     "substring-before($pText, '&#xA;')"/>

     <user>
       <name>
         <xsl:value-of select=
         "translate(substring-before($vLine, ' '),'_',' ')"/>
       </name>
       <city>
         <xsl:value-of select=
         "translate(substring-before(substring-after($vLine, ' '),' '),
                    '_',
                    ' '
                    )
         "/>
       </city>
       <zipCode>
         <xsl:value-of select=
         "translate(substring-after(substring-after($vLine, ' '),' '),
                    '_',
                    ' '
                    )
         "/>
       </zipCode>
     </user>

     <xsl:call-template name="processLines">
      <xsl:with-param name="pText" select=
      "substring-after($pText, '&#xA;')"/>
     </xsl:call-template>
  </xsl:if>
  </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the specially formatted text (wrapped within a single top element to be made well-formed -- as we'll see in XSLT 2.0 such wrapping isn't necessary):

<t>Testuser new_york 10018
usera seattle 98000
userb bellevue 98004
userb redmond 98052
</t>

the wanted result is produced:

<user>
   <name>Testuser</name>
   <city>new york</city>
   <zipCode>10018</zipCode>
</user>
<user>
   <name>usera</name>
   <city>seattle</city>
   <zipCode>98000</zipCode>
</user>
<user>
   <name>userb</name>
   <city>bellevue</city>
   <zipCode>98004</zipCode>
</user>
<user>
   <name>userb</name>
   <city>redmond</city>
   <zipCode>98052</zipCode>
</user>

Notes:

  1. This is just a demo that demonstrates how to accomplish the task. This is why I am not processing fixed-width fields (whil would be even easier), but space separated values.

  2. Any space contained in any value is entered in the input as underscore (or any character of our choosing, that we know will never be part of any value. On output, any underscore is translated to a real space.

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select=
  "unparsed-text('file:///c:/temp/delete/delete.txt')"/>

 <xsl:variable name="vLines" select=
  "tokenize($vText, '&#xD;?&#xA;')[normalize-space()]"/>

 <xsl:template match="/">
  <xsl:for-each select="$vLines">
    <xsl:variable name="vFields" select=
    "tokenize(., ' ')[normalize-space()]"/>
   <user>
     <name>
       <xsl:sequence select="translate($vFields[1], '_',' ')"/>
     </name>
     <city>
       <xsl:sequence select="translate($vFields[2], '_',' ')"/>
     </city>
     <zipCode>
       <xsl:sequence select="translate($vFields[3], '_',' ')"/>
     </zipCode>
   </user>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document (not used and actually not needed, as in XSLT 2.0 it isn't necessary to have a source XML document), and if the file C:\temp\delete\delete.txt is:

Testuser new_york 10018
usera seattle 98000
userb bellevue 98004
userb redmond 98052

again the wanted, correct result is produced:

<user>
   <name>Testuser</name>
   <city>new york</city>
   <zipCode>10018</zipCode>
</user>
<user>
   <name>usera</name>
   <city>seattle</city>
   <zipCode>98000</zipCode>
</user>
<user>
   <name>userb</name>
   <city>bellevue</city>
   <zipCode>98004</zipCode>
</user>
<user>
   <name>userb</name>
   <city>redmond</city>
   <zipCode>98052</zipCode>
</user>

Notes:

  1. Use of the standard XSLT 2.0 function unparsed-text().

  2. Use of the standard XPath 2.0 function tokenize().

Final note:

Most complex text processing has been done in an industrial way entirely in XSLT. The FXSL library contains a generic LR(1) parser and a tweaked YACC that produces XML-formatted tables that are the input to this generic run-time LR(1) parser.

Using this tool I successfully built parsers for such complex languages as JSON and XPath 2.0.

XSLT 2.0 is highly suitable for converting structured text to XML. You might like to the 2010 paper by Stephanie Haupt and Maik Stuehrenberg here:

http://www.balisage.net/Proceedings/vol5/html/Haupt01/BalisageVol5-Haupt01.html

or my own 2008 paper

http://www.saxonica.com/papers/ideadb-1.1/mhk-paper.xml

for case studies.

I wouldn't normally attempt the task using XSLT 1.0, though as Dimitre's answer shows, it can be done in simple cases.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!