XPath to parse eCFR XML using attributes and nodes

别等时光非礼了梦想. 提交于 2020-01-06 20:24:59

问题


This question has been significantly edited to make things a bit clearer.

I am attempting to pull data out of the electronic Code of Federal Regulations XML feed (http://www.gpo.gov/fdsys/bulkdata/CFR/2015/title-15/CFR-2015-title15-vol2.xml) and am having trouble.

Specifically, I'd like to grab data that will be matched by a combination of Node and Attribute. In the following snippet of XML, you can see some of the text I'd like to grab. I would like to obtain the data for each FP node where the attribute FP-2 is present. I would also like to grab the data for each FP node having the attribute FP-1.

<APPENDIX>
              <EAR>Pt. 774, Supp. 1</EAR>
              <HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
              <HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
              <HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
              <FP SOURCE="FP-2">
                <E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
              </FP>
              
              <FP SOURCE="FP-2">
                <E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="04">License Requirements</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Reason for Control:</E> NS, AT, UN</FP>
              <GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
                <BOXHD>
                  <CHED H="1">Control(s)</CHED>
                  <CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
                </BOXHD>
                <ROW>
                  <ENT I="01">NS applies to entire entry</ENT>
                  <ENT>NS Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">AT applies to entire entry</ENT>
                  <ENT>AT Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">UN applies to entire entry</ENT>
                  <ENT>See § 746.1(b) for UN controls.</ENT>
                </ROW>
              </GPOTABLE>
              <FP SOURCE="FP-1">
                <E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">LVS:</E> $3,000 for 0A018.b</FP>
              <FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
              <FP SOURCE="FP-1">
                <E T="03">GBS:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="03">CIV:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="04">List of Items Controlled</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Definitions:</E> N/A</FP>
              <FP>
                <E T="03">Items:</E> a. [Reserved]</FP>
              <P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
              <NOTE>
                <HD SOURCE="HED">
                  <E T="03">Note:</E>
                </HD>
                <P>
                  <E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
                </P>
                <P>
                  <E T="03">a. Ammunition crimped without a projectile (blank star);</E>
                </P>
 </APPENDIX>

To complicate matters, I'm trying to pull this data into Filemaker, but upon edit, I'll stick to simple XSL.

The following XSL grabs all of the FP nodes without differentiation.

<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Modifying this to match on xsl:template match="FP[@SOURCE='FP-1'] allows me to make the necessary match based on the attribute, but I'm still not clear on how to capture the data I need. Thoughts?


回答1:


A few things:

  1. Your XSLT actually is not an XSLT format
  2. In XPath, to reference an attribute (i.e., SOURCE), it must be prefixed with @.
  3. Finally, there are many FP1s and FP2s but your setup only choose first instances.

Consider the following XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>

<xsl:template match="/">
   <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">

    <METADATA>
        <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
    </METADATA>

    <RESULTSET>

    <xsl:for-each select="//FP[@SOURCE = 'FP-2']/E[@T='02']">
    <ROW>
        <COL>
            <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
        </COL>
    </ROW>
    </xsl:for-each>    

    <xsl:for-each select="//FP[@SOURCE = 'FP-1']/E[@T='02']">
    <ROW>
        <COL>
            <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
        </COL>
    </ROW>
    </xsl:for-each>        

    </RESULTSET>
</FMPXMLRESULT>

</xsl:template>
</xsl:stylesheet>

Which would output:

<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
  <METADATA>
    <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
  </METADATA>
  <RESULTSET>
    <ROW>
      <COL>
        <DATA>0A002</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A018</DATA>
      </COL>
    </ROW>
  </RESULTSET>
</FMPXMLRESULT>

And partial output of full web link xml:

<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
  <METADATA>
    <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
  </METADATA>
  <RESULTSET>
    <ROW>
      <COL>
        <DATA>2A000</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A002</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A018</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A521</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A604</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A606</DATA>
      </COL>
    </ROW>
    ...

In fact, point your XSLT processor to the GPO link and all FP1s and FP2s output. I just did so with Python! Close to 3,000 lines!




回答2:


Your question is still not clear. If I concentrate on this part:

I would like to obtain the data for each FP node where the attribute FP-2 is present. I would also like to grab the data for each FP node having the attribute FP-1.

then you probably want to change this:

<xsl:for-each select="//FP">

to:

<xsl:for-each select="//FP[@SOURCE='FP-1' or @SOURCE='FP-2']">

Note that this returns the value of each FP element where the SOURCE attribute has a value of either 'FP-1' or 'FP-2'. I see no "FP node where the attribute FP-2 is present" in your input.

Note also that the // syntax is expensive in terms of processing power. You will get better performance if you use a full, explicit path.



来源:https://stackoverflow.com/questions/31887091/xpath-to-parse-ecfr-xml-using-attributes-and-nodes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!