elementtree | 易学教程

parsing an xml file for unknown elements using python ElementTree

阅读更多关于 parsing an xml file for unknown elements using python ElementTree

问题 I wish to extract all the tag names and their corresponding data from a multi-purpose xml file. Then save that information into a python dictionary (e.g tag = key, data = value). The catch being the tags names and values are unknown and of unknown quantity. <some_root_name> <tag_x>bubbles</tag_x> <tag_y>car</tag_y> <tag...>42</tag...> </some_root_name> I'm using ElementTree and can successfully extract the root tag and can extract values by referencing the tag names, but haven't been able to

Python ElementTree does not like colon in name of processing instruction

阅读更多关于 Python ElementTree does not like colon in name of processing instruction

问题 The following code: import xml.etree.ElementTree as ET xml = '''\ <?xml version="1.0" encoding="UTF-8"?> <testCaseConfig> <?LazyComment Blah de blah/?> <testCase runLimit="420" name="d1/n1"/> <testCase runLimit="420" name="d1/n2"/> </testCaseConfig>''' root = ET.fromstring(xml) xml2 = xml.replace('LazyComment ', 'LazyComment:') print(xml2) try: root2 = ET.fromstring(xml2) except ET.ParseError: print("\nERROR in xml2!!!\n") xml3 = xml2.replace('testCaseConfig', 'testCaseConfig xmlns:Blah="http

XML line break character entity and Python encoding

阅读更多关于 XML line break character entity and Python encoding

问题 I have the following line of code in a python script: dep11 = ET.SubElement(dep1, "POVCode").text = "#declare lg_quality = LDXQual; #if (lg_quality = 3) #declare lg_quality = 4; #end" My question is in regards to the character. I want to see this character entity in the XML output, but the first ampersand keeps getting replaced with the & character entity, which creates the nonsense character entity 
 . I am encoding the file as utf-8 . import xml.etree.ElementTree as ET ... with open(

How to loop through a complicated XML structure in order to transform it to a pandas data frame

阅读更多关于 How to loop through a complicated XML structure in order to transform it to a pandas data frame

问题 I am trying to extract information from a XML file and transform it into a pandas dataframe for the following XML structure: <change user="123" timestamp="2017-09-04T13:58:46.190Z"> <log id="333" action="create"> <property id="52122"> <old/> <new> <item id="562622" toString="Test"/> <item id="033362" toString="Test2"/> </new> </property> <property id="33563"> <new> <item id="44322" toString="Test3"/> </new> </property> <property id="21733"> <old/> <new id="12341212" toString="Test4"/> <

How to write CSV into the next column

阅读更多关于 How to write CSV into the next column

问题 I have output that I can write into a CSV. However, because of how i setup my XML to text, the output iterates itself incorrectly. I've tried a lot to fix my XML output, but I don't see any way to fix it. I've tried a lot, including modifying my XML statements to trying to write to CSV in different ways, but I can't seem to get the rows to match up the way I need them to be, because of the the for in statements that have different depths. I don't really care how it's done, so long as it

Parse large python xml using xmltree

阅读更多关于 Parse large python xml using xmltree

问题 I have a python script that parses huge xml files ( largest one is 446 MB) try: parser = etree.XMLParser(encoding='utf-8') tree = etree.parse(os.path.join(srcDir, fileName), parser) root = tree.getroot() except Exception, e: print "Error parsing file "+str(fileName) + " Reason "+str(e.message) for child in root: if "PersonName" in child.tag: personName = child.text This is what my xml looks like : <?xml version="1.0" encoding="utf-8"?> <MyRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema

Concatenate XML tags to become a dataframe column name

阅读更多关于 Concatenate XML tags to become a dataframe column name

问题 I am currently parsing an XML and from that, fill a dataframe. Suppose we have this toy XML: <A> <AA> <AAA1 period='march'>ONE</AAA1> <AAA2>TWO</AAA2> <AAA3>THREE</AAA3> <AAA4> <B semester='4'>FOUR</B> <C>FIVE</C> <D>SIX</D> </AAA4> </AA> </A> And what I am trying to get is something like : [{A.AA.AAA1.period-march: 'ONE'}, {A.AA.AAA2: 'TWO'}, {A.AA.AAA3: 'THREE'}, {A.AA.AAA4.B.semester-4: 'FOUR'},{A.AA.AAA4.C: 'FIVE'}, {A.AA.AAA4.D: 'SIX'}] , which would be much easier to work with. I have

Child index out of range, python element tree

阅读更多关于 Child index out of range, python element tree

问题 I am receiving an error I have never received before when trying to run this code. File "BasicEmail.py", line 96, in init_ui root[0][1].text IndexError: child index out of range Abort trap: 6 My code is simple, class EmailBlast(QtWidgets.QWidget): def __init__(self): super().__init__() self.init_ui() def init_ui(self): user_file = 'user_info.xml' tree = ET.parse(user_file) root = tree.getroot() root[0][1].text self.emailLabel = QtWidgets.QLabel("Email:") self.emailListLabel = QtWidgets.QLabel

Difficulty parsing a section of XML file with ElementTree

阅读更多关于 Difficulty parsing a section of XML file with ElementTree

问题 I have written the code below to parse this XML file. You can see it's still a bit messy, but that I'm on the right track for most of it. You can see one part that I'm stuck on is the 'targets' section (I've left the code that I've tried for this section in here with triple quotes, but you can see that section doesn't work). I'm wondering if someone could help show me where I'm going wrong/how to parse the targets section? If you look at the HTML of the XML file here, I basically just want to

Scraping XML element attributes with beautifulsoup

阅读更多关于 Scraping XML element attributes with beautifulsoup

问题 I have the following code: from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("https://api.stlouisfed.org/fred/...") bsObj = BeautifulSoup(html.read(), "lxml"); print(bsObj) It returns something like this: <?xml version="1.0" encoding="utf-8" ?><html><body><observations count="276" file_type="xml" limit="100000" observation_end="9999-12-31" observation_start="1776-07-04" offset="0" order_by="observation_date" output_type="1" realtime_end="2016-06-22" realtime