Generic XML to CSV conversion [closed]

本小妞迷上赌 提交于 2020-01-14 03:01:06

问题


I am trying to convert a dynamic XML to CSV. I searched for various options to achieve this but did not find a suitable answer.

The structure of the XML is dynamic - It can be a product data, a geography data or any such thing. So, I am not able to use predefined XSL or castor conversion.

The tag names should form the header of the CSV. For example :

<Ctry>
  <datarow>
     <CtryName>Ctry1</CtryName>
     <CtryID>12361</CtryID>
    <State>
      <datarow>
         <StateName>State1</StateName>
         <StateID>12361</StateID>
        <City>
           <datarow>
              <CityName>City1</CityName>
               <CityID>12361</CityID>
           </datarow>
        </City>
      </datarow>
      <datarow>
         <StateName>State2</StateName>
         <StateID>12361</StateID>
      </datarow>
      </State>
  </datarow>
</Ctry>

The CSV should look like :

Header: CtryName   CtryId     StateName  StateId     CityName   CityID
Row1:   Ctry1       12361     State1     12361       City1      12361
Row2:   Ctry1       12361     State2     12361  

Could you please recommend the apt thing to use to address this problem?


回答1:


Below is a transcript illustrating the execution of a generic stylesheet to do such conversion. The only assumption made by the stylesheet is the element <datarows>. The structure given implies the use of child elements based on the requested results:

Data:

  T:\ftemp>type xml2csv.xml 
  <Ctry>
    <datarow>
       <CtryName>Ctry1</CtryName>
       <CtryID>12361</CtryID>
      <State>
        <datarow>
           <StateName>State1</StateName>
           <StateID>12361</StateID>
          <City>
             <datarow>
                <CityName>City1</CityName>
                 <CityID>12361</CityID>
             </datarow>
          </City>
        </datarow>
        <datarow>
           <StateName>State2</StateName>
           <StateID>12361</StateID>
        </datarow>
        </State>
    </datarow>
  </Ctry>

Execution:

  T:\ftemp>call xslt2 xml2csv.xml xml2csv.xsl 
  CtryName,CtryID,StateName,StateID,CityName,CityID
  Ctry1,12361,State1,12361,City1,12361
  Ctry1,12361,State2,12361

Stylesheet:

  T:\ftemp>type xml2csv.xsl 
  <?xml version="1.0" encoding="US-ASCII"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                  version="2.0">

  <xsl:output method="text"/>

  <xsl:variable name="fields" 
                select="distinct-values(//datarow/*[not(*)]/name(.))"/>

  <xsl:template match="/">
    <!--header row-->
    <xsl:value-of select="$fields" separator=","/>

    <!--body-->
    <xsl:apply-templates select="*"/>

    <!--final line terminator-->
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>

  <!--elements only process elements, not text-->
  <xsl:template match="*">
    <xsl:apply-templates select="*"/>
  </xsl:template>

  <!--these elements are CSV fields-->
  <xsl:template match="datarow/*[not(*)]">
    <!--replicate ancestors if necessary-->
    <xsl:if test="position()=1 and ../preceding-sibling::datarow">
      <xsl:for-each select="ancestor::datarow[position()>1]/*[not(*)]">
        <xsl:call-template name="doThisField"/>
      </xsl:for-each>
    </xsl:if>
    <xsl:call-template name="doThisField"/>
  </xsl:template>

  <!--put out a field ending the previous field and escaping content-->
  <xsl:template name="doThisField">
    <xsl:choose>
      <xsl:when test="name(.)=$fields[1]">
        <!--previous line terminator-->
        <xsl:text>&#xa;</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <!--previous field terminator-->
        <xsl:text>,</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
    <!--field value escaped per RFC4180-->
    <xsl:choose>
      <xsl:when test="contains(.,'&#x22;') or 
                      contains(.,',') or
                      contains(.,'&#xa;')">
        <xsl:text>"</xsl:text>
        <xsl:value-of select="replace(.,'&#x22;','&#x22;&#x22;')"/>
        <xsl:text>"</xsl:text>
      </xsl:when>
      <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  </xsl:stylesheet>

Note that the above code escapes the individual fields per RFC4180.

My profile has a link to my web site where you will find a directory of free XML resources including an XSLT stylesheet to convert RFC4180 CSV files into XML files.

This is an XSLT 1.0 solution to the answer, as requested by the original poster:

t:\ftemp>type xml2csv1.xsl 
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

<xsl:output method="text"/>

<xsl:variable name="firstFieldName" 
              select="name((//datarow/*[not(*)])[1])"/>

<xsl:key name="names" match="datarow/*[not(*)]" use="name(.)"/>

<xsl:template match="/">
  <!--header row-->
  <xsl:for-each select="//datarow/*[not(*)]
                        [generate-id(.)=
                         generate-id(key('names',name(.))[1])]">
    <xsl:if test="position()>1">,</xsl:if>
    <xsl:value-of select="name(.)"/>
  </xsl:for-each>

  <!--body-->
  <xsl:apply-templates select="*"/>

  <!--final line terminator-->
  <xsl:text>&#xa;</xsl:text>
</xsl:template>

<!--elements only process elements, not text-->
<xsl:template match="*">
  <xsl:apply-templates select="*"/>
</xsl:template>

<!--these elements are CSV fields-->
<xsl:template match="datarow/*[not(*)]">
  <!--replicate ancestors if necessary-->
  <xsl:if test="position()=1 and ../preceding-sibling::datarow">
    <xsl:for-each select="ancestor::datarow[position()>1]/*[not(*)]">
      <xsl:call-template name="doThisField"/>
    </xsl:for-each>
  </xsl:if>
  <xsl:call-template name="doThisField"/>
</xsl:template>

<!--put out a field ending the previous field and escaping content-->
<xsl:template name="doThisField">
  <xsl:choose>
    <xsl:when test="name(.)=$firstFieldName">
      <!--previous line terminator-->
      <xsl:text>&#xa;</xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <!--previous field terminator-->
      <xsl:text>,</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
  <!--field value escaped per RFC4180-->
  <xsl:choose>
    <xsl:when test="contains(.,'&#x22;') or 
                    contains(.,',') or
                    contains(.,'&#xa;')">
      <xsl:text>"</xsl:text>
      <xsl:call-template name="escapeQuote"/>
      <xsl:text>"</xsl:text>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!--escape a double quote in the current node value with two double quotes-->
<xsl:template name="escapeQuote">
  <xsl:param name="rest" select="."/>
  <xsl:choose>
    <xsl:when test="contains($rest,'&#x22;')">
      <xsl:value-of select="substring-before($rest,'&#x22;')"/>
      <xsl:text>""</xsl:text>
      <xsl:call-template name="escapeQuote">
        <xsl:with-param name="rest" select="substring-after($rest,'&#x22;')"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$rest"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

</xsl:stylesheet>


来源:https://stackoverflow.com/questions/18951498/generic-xml-to-csv-conversion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!