How to pull data from KML/XML?

拈花ヽ惹草 提交于 2019-12-24 11:52:37

问题


I have some data I converted to XML from a KML file and I was curious how to use PHP or Ruby to get back things like the neighborhood names and coordinates. I know when they have a tag around them like so.

<cities>
  <neighborhood>Gotham</neighborhood>
</cities>

but the data is unfortunately formatted as:

<SimpleData name="neighborhd">Colgate Center</SimpleData>

instead of

<neighborhd>Colgate Center</neighborhd>

This is the KML source:

How can I use PHP or Ruby to pull data from something like this? I installed some Ruby gems for parsing XML data but XML is just something I haven't worked with much.


回答1:


Your XML is invalid, but Nokogiri will attempt to fix it up.

Here's how to check for invalid XML/XHTML/HTML and how to rewrite the section you want.

Here's the setup:

require 'nokogiri'

doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
  <Document>
    <Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
      <SimpleField type="int" name="nid"/>
      <SimpleField type="string" name="neighborhd"/>
      <SimpleField type="string" name="place"/>
      <SimpleField type="string" name="placecode"/>
      <SimpleField type="string" name="nbr_type"/>
      <SimpleField type="string" name="po_name"/>
      <SimpleField type="string" name="metro"/>
      <SimpleField type="string" name="country"/>
      <SimpleField type="string" name="state"/>
      <SimpleField type="string" name="statefips"/>
      <SimpleField type="string" name="county"/>
      <SimpleField type="string" name="countyfips"/>
      <SimpleField type="string" name="mcd"/>
      <SimpleField type="string" name="mcdfips"/>
      <SimpleField type="string" name="cbsa"/>
      <SimpleField type="string" name="cbsacode"/>
      <SimpleField type="string" name="cbsatype"/>
      <SimpleField type="double" name="cenlat"/>
      <SimpleField type="double" name="cenlon"/>
      <SimpleField type="int" name="color"/>
      <SimpleField type="string" name="ncs_code"/>
      <SimpleField type="string" name="release"/>
    </Schema>
    <Style id="KMLSTYLER_6">
      <LabelStyle>
        <scale>1.0</scale>
      </LabelStyle>
      <LineStyle>
        <colorMode>normal</colorMode>
      </LineStyle>
      <PolyStyle>
        <color>7f4080ff</color>
        <colorMode>random</colorMode>
      </PolyStyle>
    </Style>
    <name>Sample_Neighborhoods_NYC</name>
    <visibility>1</visibility>
    <Folder id="kml_ft_Sample_Neighborhoods_Samples">
      <name>Sample_Neighborhoods_Samples</name>
      <Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
        <name>Sample_Neighborhoods_NYC</name>
        <Placemark id="kml_1">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <styleUrl>#KMLSTYLER_6</styleUrl>
          <ExtendedData>
            <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
              <SimpleData name="nid">7086</SimpleData>
              <SimpleData name="neighborhd">Colgate Center</SimpleData>
              <SimpleData name="place">Jersey City</SimpleData>
              <SimpleData name="placecode">36000</SimpleData>
              <SimpleData name="nbr_type">S</SimpleData>
              <SimpleData name="po_name">JERSEY CITY</SimpleData>
              <SimpleData name="metro">New York City, NY</SimpleData>
              <SimpleData name="country">USA</SimpleData>
              <SimpleData name="state">NJ</SimpleData>
              <SimpleData name="statefips">34</SimpleData>
              <SimpleData name="county">Hudson</SimpleData>
              <SimpleData name="countyfips">34017</SimpleData>
              <SimpleData name="mcd">Jersey City</SimpleData>
              <SimpleData name="mcdfips">36000</SimpleData>
              <SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
              <SimpleData name="cbsacode">35620</SimpleData>
              <SimpleData name="cbsatype">Metro</SimpleData>
              <SimpleData name="cenlat">40.7145135000001</SimpleData>
              <SimpleData name="cenlon">-74.0343385</SimpleData>
              <SimpleData name="color">1</SimpleData>
              <SimpleData name="ncs_code">40910000</SimpleData>
              <SimpleData name="release">1.12.2</SimpleData>
            </SchemaData>
          </ExtendedData>
          <Polygon>
            <outerBoundaryIs>
              <LinearRing>
                <coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0                     -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
              </LinearRing>
            </outerBoundaryIs>
          </Polygon>
        </Placemark>
        <Placemark id="kml_2">
          <name>Colgate Center</name>
          <Snippet> </Snippet>
          <ExtendedData>
EOT

Here's how to see if there are errors. Any time errors is not empty you have a problem.

puts doc.errors

Here's one way to find the SimpleData nodes throughout a document. I prefer to use CSS accessors over XPath for readability reasons. Sometimes XPath is better because it allows better granularity when searching. You need to learn them both.

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end

Here's the output after running:

Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>

I'm not trying to modify the DOM, but it's easy to do:

doc.search('ExtendedData SimpleData').each do |simple_data|
  node_name = simple_data['name']
  simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end

puts doc.to_xml

After running this is the affected section:

<ExtendedData>
  <SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
    <nid>7086</nid>
    <neighborhd>Colgate Center</neighborhd>
    <place>Jersey City</place>
    <placecode>36000</placecode>
    <nbr_type>S</nbr_type>
    <po_name>JERSEY CITY</po_name>
    <metro>New York City, NY</metro>
    <country>USA</country>
    <state>NJ</state>
    <statefips>34</statefips>
    <county>Hudson</county>
    <countyfips>34017</countyfips>
    <mcd>Jersey City</mcd>
    <mcdfips>36000</mcdfips>
    <cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
    <cbsacode>35620</cbsacode>
    <cbsatype>Metro</cbsatype>
    <cenlat>40.7145135000001</cenlat>
    <cenlon>-74.0343385</cenlon>
    <color>1</color>
    <ncs_code>40910000</ncs_code>
    <release>1.12.2</release>
  </SchemaData>
</ExtendedData>


来源:https://stackoverflow.com/questions/16861464/how-to-pull-data-from-kml-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!