elementtree | 易学教程

I trouble in how do parse multiple xml file and process it as dataframe in Python

阅读更多关于 I trouble in how do parse multiple xml file and process it as dataframe in Python

问题 I want parse multi xml file into dataframe. There are same xpath. I have used element tree and os Python library.It can parse all the files, but it print out empty dataframe. However if code without multiple file, it can work properly. mypath = r'C:\Users\testFile' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')] for file in files: xtree = et.parse(file) xroot = xtree.getroot() df_cols=['value'] out_xml=pd.DataFrame(columns=df_cols) for node in xroot.findall(r'.

Traversing TEI in Python 3, text comes up empty for some entities

阅读更多关于 Traversing TEI in Python 3, text comes up empty for some entities

问题 I have a TEI-encoded xml file with entities as follows: <sp> <speaker rend="italic">Sampson.</speaker> <ab> <lb n="5"/> <hi rend="italic">Gregory:</hi> <seg type="homograph">A</seg> my word wee'l not carry coales.<lb n="6"/> </ab> </sp> <sp> <speaker rend="italic">Greg.</speaker> <ab>No, for then we should be Colliars. <lb n="7" rend="rj"/> </ab> </sp> The full file is very large but can be accessed here: http://ota.ox.ac.uk/desc/5721. I'm attempting to use Python 3 to traverse the xml and

Urllib combined together with elementtree

阅读更多关于 Urllib combined together with elementtree

问题 I'm having a few problems with parsing simple HTML with use of the ElementTree module out of the standard Python libraries. This is my source code: from urllib.request import urlopen from xml.etree.ElementTree import ElementTree import sys def main(): site = urlopen("http://1gabba.in/genre/hardstyle") try: html = site.read().decode('utf-8') xml = ElementTree(html) print(xml) print(xml.findall("a")) except: print(sys.exc_info()) if __name__ == '__main__': main() Either this fails, I get the

Parse many XML files to one CSV file

阅读更多关于 Parse many XML files to one CSV file

问题 The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here. from xml.etree import ElementTree as ET from collections import defaultdict import csv tree = ET.parse('thexmlfile.xml') root = tree.getroot() with open('output.csv', 'w', newline='') as f: writer = csv.writer(f) start_nodes = root.findall('.//START') headers = ['id',

Parse many XML files to one CSV file

阅读更多关于 Parse many XML files to one CSV file

Undefined entity error while using ElementTree

阅读更多关于 Undefined entity error while using ElementTree

问题 I have a set of XML files that I need to read and format into a single CSV file. In order to read from the XML files, I have used the solution mentioned here. My code looks like this: from os import listdir import xml.etree.cElementTree as et files = listdir(".../blogs/") for i in range(len(files)): # fname = ".../blogs/" + files[i] f = open(".../blogs/" + files[i], 'r') contents = f.read() tree=et.fromstring(contents) for el in tree.findall('post'): post = el.text f.close() This gives me the

How to output XML entity references

阅读更多关于 How to output XML entity references

问题 I am using Python xml.etree.ElementTree to output XML. I want to output it with entity references that will be substituted when the XML is parsed. ordinarily '&' is escaped as & because '&' is used to declare entity references. However, I really do want to write an entity reference. For example, I want to write an XML file containing the entity reference &manifestName; : >>> from xml.etree.ElementTree import Element, tostring >>> manifest = Element('manifest') >>> manifest.text = '

xml.etree.ElementTree - Trouble setting xmlns = '…'

阅读更多关于 xml.etree.ElementTree - Trouble setting xmlns = '…'

问题 I must be missing something. I'm attempting to set up a google product feed, but am having a hard time registering the namespace. Example: Directions here: https://support.google.com/merchants/answer/160589 Trying to insert: <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> This is the code: from xml.etree import ElementTree from xml.etree.ElementTree import Element, SubElement, Comment, tostring tree = ElementTree tree.register_namespace('xmlns:g', 'http://base.google.com/ns/1.0')

Find and replacing text in elementtree

阅读更多关于 Find and replacing text in elementtree

问题 i am very new to programming and python. I am trying to find and replace a text in an xml file. Here is my xml file <?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE doc PUBLIC "-//MYCOMPANY//DTD XSEIF 1/FAD 110 05 R5//EN" "XSEIF_R5.dtd"> <doc version="XSEIF R5" xmlns="urn:x-mycompany:r2:reg-doc:1551-fad.110.05:en:*"> <meta-data></meta-data> <front></front> <body> <chl1><title xml:id="id_881i">Installation</title> <p>To install SDK, perform the tasks

Parsing RSS with Elementtree in Python

阅读更多关于 Parsing RSS with Elementtree in Python

问题 How do you search for namespace-specific tags in XML using Elementtree in Python? I have an XML/RSS document like: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.0/" > <channel> <title>sometitle</title> <pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate> <generator>http://wordpress.org/?v=2.5.1<