elementtree

How to extract information after a node in XML with Python?

…衆ロ難τιáo~ 提交于 2021-01-28 12:11:54
问题 I have the following XML structure (very large file, many more person entries) <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE population SYSTEM "http://www.matsim.org/files/dtd/population_v6.dtd"> <population desc="Switzerland Baseline"> <attributes> <attribute name="coordinateReferenceSystem" class="java.lang.String" >Atlantis</attribute> </attributes> <!-- ====================================================================== --> <person id="10"> <attributes> <attribute name="age" class=

lxml and xml namespaces - Using find and findall to get XML Tag Value

冷暖自知 提交于 2021-01-28 05:31:02
问题 I had issues in getting the text value of and nodes using lxml where the XML text has namespaces in it. I was using findall('Status') but the result was always coming to null. I arrived at the following working code in the end....Is this the correct way of using lxml for fetching node values? Can i improve this further? import lxml xml_string='<?xml version="1.0" encoding="UTF-8"?> <SCPP:Response xmlns:SCPP="http://www.SCPP.com/XMLSchema"> <SCPP:RESP_BODY> <Seed>001335834994</Seed> </SCPP

Finding and Extracting Data using XML in Python

狂风中的少年 提交于 2021-01-28 04:03:53
问题 You could work from the top of the XML down to the comments node and then loop through the child nodes of the comments node. I am sure this is what I need to do but I'm not sure how to go about doing this. I have an XML data structure similar to: <level> <name>Matthias</name> <age>23</age> <gender>Male</gender> </level> ... I am trying to present the name, age and character gender to the user by extracting the data in to Python for data validation, processing and output. How do I extract only

Parsing HTML page containing & using Python

て烟熏妆下的殇ゞ 提交于 2021-01-27 16:13:12
问题 I am trying to parse HTML page in python using urllib2 and ElementTree and I am facing trouble parsing the HTML. Webpage contains "&" within quoted string but ElementTree throws parseError for lines containing & Script: import urllib2 url = 'http://eciresults.nic.in/ConstituencywiseU011.htm' req = urllib2.Request(url, headers={'Content-type': 'text/xml'}) r = urllib2.urlopen(req).read() import xml.etree.ElementTree as ET htmlpage=ET.fromstring(r) This throws following error in Python 2.7

Parsing HTML page containing & using Python

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-27 16:10:47
问题 I am trying to parse HTML page in python using urllib2 and ElementTree and I am facing trouble parsing the HTML. Webpage contains "&" within quoted string but ElementTree throws parseError for lines containing & Script: import urllib2 url = 'http://eciresults.nic.in/ConstituencywiseU011.htm' req = urllib2.Request(url, headers={'Content-type': 'text/xml'}) r = urllib2.urlopen(req).read() import xml.etree.ElementTree as ET htmlpage=ET.fromstring(r) This throws following error in Python 2.7

How to remove empty XML tags, containing whitespace only, in XML?

心已入冬 提交于 2021-01-27 12:42:04
问题 I need to remove cases like this: <text> </text> I have codes that works when there is no whitespace, but what about if there is whitespace? Code: doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""") def remove_empty_elements(doc): for element in doc.xpath('//*[not(node())]'): element.getparent().remove(element) I also need to do it with lxml and not BeautifulSoup. 回答1: This XPath, //*[not(*)][not(normalize-space())] will select all leaf elements with only whitespace content. For

How to remove empty XML tags, containing whitespace only, in XML?

旧时模样 提交于 2021-01-27 12:32:13
问题 I need to remove cases like this: <text> </text> I have codes that works when there is no whitespace, but what about if there is whitespace? Code: doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""") def remove_empty_elements(doc): for element in doc.xpath('//*[not(node())]'): element.getparent().remove(element) I also need to do it with lxml and not BeautifulSoup. 回答1: This XPath, //*[not(*)][not(normalize-space())] will select all leaf elements with only whitespace content. For

How do I wrap the contents of a SubElement in an XML tag in Python 3?

旧街凉风 提交于 2021-01-27 07:57:08
问题 I have a sample xml file like this: <root> She <opt>went</opt> <opt>didn't go</opt> to school. </root> I want to create a subelement named of , and put all the contents of into it. That is, <root> <sentence> She <opt>went</opt> <opt>didn't go</opt> to school. </sentence> </root> I know hot to make a subelement with ElementTree or lxml, but I have no idea of how to select from "She" to "shools." all at once. import lxml.etree as ET ET.SubElement(root, 'sentence') I'm lost... 回答1: You could go

What is the difference between a ElementTree and an Element? (python xml)

末鹿安然 提交于 2021-01-21 08:06:20
问题 from xml.etree.ElementTree import ElementTree, Element, SubElement, dump elem = Element('1') sub = SubElement(elem, '2') tree = ElementTree(elem) dump(tree) dump(elem) In the code above, dumping tree (which is an ElementTree) and dumping elem (which is an Element) results in the same thing. Therefore I am having trouble determining what the difference is between the two. 回答1: dumping tree (which is an ElementTree) and dumping elem (which is an Element) results in the same thing. dump()

parse xml to pandas data frame in python

蓝咒 提交于 2021-01-20 07:10:31
问题 I am trying to read the XML file and convert it to pandas. However it returns empty data This is the sample of xml structure: <Instance ID="1"> <MetaInfo StudentID ="DTSU040" TaskID="LP03_PR09.bLK.sh" DataSource="DeepTutorSummer2014"/> <ProblemDescription>A car windshield collides with a mosquito, squashing it.</ProblemDescription> <Question>How does this work tion?</Question> <Answer>tthis is my best </Answer> <Annotation Label="correct(0)|correct_but_incomplete(1)|contradictory(0)|incorrect