问题
My problem is very similar to the one found here:
How to pull data from KML/XML?
The answer to the above question is to use Nokogiri to fix the format.
I wonder if there is a way to solve a similar problem without fixing the format first.
How can I get the values of the dict, so that I can get 'FM2' and 'FM3' from the Element SimpleData below?
Here is my kml:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>Test.kml</name>
<open>1</open>
<Schema name="test" id="S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
<SimpleField type="string" name="ID"> <displayName><b>ID</b></displayName>
</SimpleField>
<SimpleField type="string" name="cname"><displayName><b>cname</b></displayName>
</SimpleField>
</Schema>
<Style id="falseColor01">
<BalloonStyle>
<text><![CDATA[<table border="0"><tr>
<td>b>ID</b>/td>td>$[test/ID]</td></tr>
<tr><td><b>cname</b></td><td>$[test/cname]</td></tr>
</table>]]></text>
</BalloonStyle>
<LineStyle>
<color>ffffff00</color>
<width>3</width>
</LineStyle>
<PolyStyle>
<color>ffffff00</color>
<colorMode>random</colorMode>
<fill>0</fill>
</PolyStyle>
</Style>
<StyleMap id="falseColor0">
<Pair>
<key>normal</key>
<styleUrl>#falseColor00</styleUrl>
</Pair>
<Pair>
<key>highlight</key>
<styleUrl>#falseColor01</styleUrl>
</Pair>
</StyleMap>
<Style id="falseColor00">
<BalloonStyle>
</BalloonStyle>
<LineStyle>
<color>ffffff00</color>
<width>3</width>
</LineStyle>
<PolyStyle>
<color>ffffff00</color>
<colorMode>random</colorMode>
<fill>0</fill>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>Test_1</name>
<open>1</open>
<Placemark>
<styleUrl>#falseColor0</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
<SimpleData name="ID">FM2</SimpleData>
<SimpleData name="cname">FM2</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>150.889999,-32.17281600000001,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark>
<styleUrl>#falseColor0</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#S_test_SSSSSIIIDSDDDDDISSSDSSSDD">
<SimpleData name="ID">FM3</SimpleData>
<SimpleData name="cname">FM3</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>150.90104,-32.15662800000001,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
</Folder>
</Document>
</kml>
My aim is to obtain the Element values, i.e. 'FM2' from the Elements 'ID'.
I'm trying to use lxml etree. My code is:
tree = ET.parse(kml_file)
root = tree.getroot()
for Document in root:
for Folder in Document:
for Placemark in Folder:
for ExtendedData in Placemark:
for SchemaData in ExtendedData:
for SimpleData in SchemaData:
print(SimpleData.attrib)
and the output is: {'name': 'ID'} {'name': 'cname'}
How can I get the values of the dict, so that I can get 'FM2' and 'FM3'?
I have spent hours in trying to solve the problem. Any help would be much appreciated.
回答1:
One of the issues you're having is that when you do for x in y
you're iterating all children of the current element.
So when you do this:
for Folder in Document:
...
you're not just iterating over Folder
elements; you're also iterating over name
, open
, Schema
, Style
, and StyleMap
(excluded the namespace for now).
You could still get what you want by testing the name
attribute value and then returning the elements text...
for Document in root:
for Folder in Document:
for Placemark in Folder:
for ExtendedData in Placemark:
for SchemaData in ExtendedData:
for SimpleData in SchemaData:
if SimpleData.get("name") == "ID":
print(SimpleData.text)
but I would not recommend it.
Instead consider using XPath 1.0 with lxml's xpath() function.
This will allow you to directly target the elements you're interested in.
For this example I'm going to use the full path instead of the //
abbreviated syntax. I'll also use a predicate to test the attribute value.
At first glance you would think that the XPath to all of the SimpleData
elements with a name
attribute value of "ID" would be:
/kml/Document/Folder/Placemark/ExtendedData/SchemaData/SimpleData[@name='ID']
but this is not the case. If you notice there is an xmlns="http://www.opengis.net/kml/2.2"
on the root (kml
) element. This means that that element and all of its decendant elements are in the default namespace http://www.opengis.net/kml/2.2
(unless declared otherwise on those elements).
To illustrate, if you added a print(f"In Folder element \"{Folder.tag}\"...")
to your for Folder in Document
loop, you'd see:
In Folder element "{http://www.opengis.net/kml/2.2}name"...
In Folder element "{http://www.opengis.net/kml/2.2}open"...
In Folder element "{http://www.opengis.net/kml/2.2}Schema"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}StyleMap"...
In Folder element "{http://www.opengis.net/kml/2.2}Style"...
In Folder element "{http://www.opengis.net/kml/2.2}Folder"...
There are a few ways to handle namespaces in lxml, but I prefer to declare them in a dictionary and pass them with the namespaces
argument.
Here's a full example...
from lxml import etree
ns = {"kml": "http://www.opengis.net/kml/2.2"}
tree = etree.parse("test.kml")
for simple_data in tree.xpath("/kml:kml/kml:Document/kml:Folder/kml:Placemark/kml:ExtendedData/kml:SchemaData/kml:SimpleData[@name='ID']", namespaces=ns):
print(simple_data.text)
Print Output...
FM2
FM3
回答2:
For some reason, I ran into problems with xml validity of your kml_file
, so I did it this way:
import lxml.html
tree = lxml.html.fromstring(kml_file)
results = tree.xpath("//*[@name = 'ID']")
for i in results:
if i.text:
print(i.text)
I'm not sure this is what you're looking for, but the output is:
FM2
FM3
来源:https://stackoverflow.com/questions/55586376/how-to-obtain-element-values-from-a-kml-by-using-lmxl