elementtree

parsing an xml file for unknown elements using python ElementTree

放肆的年华 提交于 2019-12-14 03:48:38
问题 I wish to extract all the tag names and their corresponding data from a multi-purpose xml file. Then save that information into a python dictionary (e.g tag = key, data = value). The catch being the tags names and values are unknown and of unknown quantity. <some_root_name> <tag_x>bubbles</tag_x> <tag_y>car</tag_y> <tag...>42</tag...> </some_root_name> I'm using ElementTree and can successfully extract the root tag and can extract values by referencing the tag names, but haven't been able to

Python ElementTree does not like colon in name of processing instruction

∥☆過路亽.° 提交于 2019-12-14 02:01:45
问题 The following code: import xml.etree.ElementTree as ET xml = '''\ <?xml version="1.0" encoding="UTF-8"?> <testCaseConfig> <?LazyComment Blah de blah/?> <testCase runLimit="420" name="d1/n1"/> <testCase runLimit="420" name="d1/n2"/> </testCaseConfig>''' root = ET.fromstring(xml) xml2 = xml.replace('LazyComment ', 'LazyComment:') print(xml2) try: root2 = ET.fromstring(xml2) except ET.ParseError: print("\nERROR in xml2!!!\n") xml3 = xml2.replace('testCaseConfig', 'testCaseConfig xmlns:Blah="http

XML line break character entity and Python encoding

孤街浪徒 提交于 2019-12-13 23:36:46
问题 I have the following line of code in a python script: dep11 = ET.SubElement(dep1, "POVCode").text = "#declare lg_quality = LDXQual; #if (lg_quality = 3) #declare lg_quality = 4; #end" My question is in regards to the character. I want to see this character entity in the XML output, but the first ampersand keeps getting replaced with the & character entity, which creates the nonsense character entity &#x0A; . I am encoding the file as utf-8 . import xml.etree.ElementTree as ET ... with open(

How to loop through a complicated XML structure in order to transform it to a pandas data frame

China☆狼群 提交于 2019-12-13 10:18:18
问题 I am trying to extract information from a XML file and transform it into a pandas dataframe for the following XML structure: <change user="123" timestamp="2017-09-04T13:58:46.190Z"> <log id="333" action="create"> <property id="52122"> <old/> <new> <item id="562622" toString="Test"/> <item id="033362" toString="Test2"/> </new> </property> <property id="33563"> <new> <item id="44322" toString="Test3"/> </new> </property> <property id="21733"> <old/> <new id="12341212" toString="Test4"/> <

How to write CSV into the next column

主宰稳场 提交于 2019-12-13 08:43:08
问题 I have output that I can write into a CSV. However, because of how i setup my XML to text, the output iterates itself incorrectly. I've tried a lot to fix my XML output, but I don't see any way to fix it. I've tried a lot, including modifying my XML statements to trying to write to CSV in different ways, but I can't seem to get the rows to match up the way I need them to be, because of the the for in statements that have different depths. I don't really care how it's done, so long as it

Parse large python xml using xmltree

时间秒杀一切 提交于 2019-12-13 03:20:37
问题 I have a python script that parses huge xml files ( largest one is 446 MB) try: parser = etree.XMLParser(encoding='utf-8') tree = etree.parse(os.path.join(srcDir, fileName), parser) root = tree.getroot() except Exception, e: print "Error parsing file "+str(fileName) + " Reason "+str(e.message) for child in root: if "PersonName" in child.tag: personName = child.text This is what my xml looks like : <?xml version="1.0" encoding="utf-8"?> <MyRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema

Concatenate XML tags to become a dataframe column name

亡梦爱人 提交于 2019-12-13 03:20:33
问题 I am currently parsing an XML and from that, fill a dataframe. Suppose we have this toy XML: <A> <AA> <AAA1 period='march'>ONE</AAA1> <AAA2>TWO</AAA2> <AAA3>THREE</AAA3> <AAA4> <B semester='4'>FOUR</B> <C>FIVE</C> <D>SIX</D> </AAA4> </AA> </A> And what I am trying to get is something like : [{A.AA.AAA1.period-march: 'ONE'}, {A.AA.AAA2: 'TWO'}, {A.AA.AAA3: 'THREE'}, {A.AA.AAA4.B.semester-4: 'FOUR'},{A.AA.AAA4.C: 'FIVE'}, {A.AA.AAA4.D: 'SIX'}] , which would be much easier to work with. I have

Child index out of range, python element tree

自作多情 提交于 2019-12-13 02:55:56
问题 I am receiving an error I have never received before when trying to run this code. File "BasicEmail.py", line 96, in init_ui root[0][1].text IndexError: child index out of range Abort trap: 6 My code is simple, class EmailBlast(QtWidgets.QWidget): def __init__(self): super().__init__() self.init_ui() def init_ui(self): user_file = 'user_info.xml' tree = ET.parse(user_file) root = tree.getroot() root[0][1].text self.emailLabel = QtWidgets.QLabel("Email:") self.emailListLabel = QtWidgets.QLabel

Difficulty parsing a section of XML file with ElementTree

梦想的初衷 提交于 2019-12-13 02:26:38
问题 I have written the code below to parse this XML file. You can see it's still a bit messy, but that I'm on the right track for most of it. You can see one part that I'm stuck on is the 'targets' section (I've left the code that I've tried for this section in here with triple quotes, but you can see that section doesn't work). I'm wondering if someone could help show me where I'm going wrong/how to parse the targets section? If you look at the HTML of the XML file here, I basically just want to

Scraping XML element attributes with beautifulsoup

♀尐吖头ヾ 提交于 2019-12-13 02:19:01
问题 I have the following code: from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("https://api.stlouisfed.org/fred/...") bsObj = BeautifulSoup(html.read(), "lxml"); print(bsObj) It returns something like this: <?xml version="1.0" encoding="utf-8" ?><html><body><observations count="276" file_type="xml" limit="100000" observation_end="9999-12-31" observation_start="1776-07-04" offset="0" order_by="observation_date" output_type="1" realtime_end="2016-06-22" realtime