elementtree

I trouble in how do parse multiple xml file and process it as dataframe in Python

守給你的承諾、 提交于 2020-01-16 14:11:10
问题 I want parse multi xml file into dataframe. There are same xpath. I have used element tree and os Python library.It can parse all the files, but it print out empty dataframe. However if code without multiple file, it can work properly. mypath = r'C:\Users\testFile' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')] for file in files: xtree = et.parse(file) xroot = xtree.getroot() df_cols=['value'] out_xml=pd.DataFrame(columns=df_cols) for node in xroot.findall(r'.

Traversing TEI in Python 3, text comes up empty for some entities

青春壹個敷衍的年華 提交于 2020-01-16 04:59:08
问题 I have a TEI-encoded xml file with entities as follows: <sp> <speaker rend="italic">Sampson.</speaker> <ab> <lb n="5"/> <hi rend="italic">Gregory:</hi> <seg type="homograph">A</seg> my word wee'l not carry coales.<lb n="6"/> </ab> </sp> <sp> <speaker rend="italic">Greg.</speaker> <ab>No, for then we should be Colliars. <lb n="7" rend="rj"/> </ab> </sp> The full file is very large but can be accessed here: http://ota.ox.ac.uk/desc/5721. I'm attempting to use Python 3 to traverse the xml and

Urllib combined together with elementtree

折月煮酒 提交于 2020-01-16 03:28:11
问题 I'm having a few problems with parsing simple HTML with use of the ElementTree module out of the standard Python libraries. This is my source code: from urllib.request import urlopen from xml.etree.ElementTree import ElementTree import sys def main(): site = urlopen("http://1gabba.in/genre/hardstyle") try: html = site.read().decode('utf-8') xml = ElementTree(html) print(xml) print(xml.findall("a")) except: print(sys.exc_info()) if __name__ == '__main__': main() Either this fails, I get the

Parse many XML files to one CSV file

筅森魡賤 提交于 2020-01-15 09:53:18
问题 The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here. from xml.etree import ElementTree as ET from collections import defaultdict import csv tree = ET.parse('thexmlfile.xml') root = tree.getroot() with open('output.csv', 'w', newline='') as f: writer = csv.writer(f) start_nodes = root.findall('.//START') headers = ['id',

Parse many XML files to one CSV file

谁说胖子不能爱 提交于 2020-01-15 09:53:09
问题 The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here. from xml.etree import ElementTree as ET from collections import defaultdict import csv tree = ET.parse('thexmlfile.xml') root = tree.getroot() with open('output.csv', 'w', newline='') as f: writer = csv.writer(f) start_nodes = root.findall('.//START') headers = ['id',

Undefined entity error while using ElementTree

一曲冷凌霜 提交于 2020-01-15 03:46:28
问题 I have a set of XML files that I need to read and format into a single CSV file. In order to read from the XML files, I have used the solution mentioned here. My code looks like this: from os import listdir import xml.etree.cElementTree as et files = listdir(".../blogs/") for i in range(len(files)): # fname = ".../blogs/" + files[i] f = open(".../blogs/" + files[i], 'r') contents = f.read() tree=et.fromstring(contents) for el in tree.findall('post'): post = el.text f.close() This gives me the

How to output XML entity references

荒凉一梦 提交于 2020-01-14 14:50:27
问题 I am using Python xml.etree.ElementTree to output XML. I want to output it with entity references that will be substituted when the XML is parsed. ordinarily '&' is escaped as & because '&' is used to declare entity references. However, I really do want to write an entity reference. For example, I want to write an XML file containing the entity reference &manifestName; : >>> from xml.etree.ElementTree import Element, tostring >>> manifest = Element('manifest') >>> manifest.text = '

xml.etree.ElementTree - Trouble setting xmlns = '…'

让人想犯罪 __ 提交于 2020-01-14 05:18:07
问题 I must be missing something. I'm attempting to set up a google product feed, but am having a hard time registering the namespace. Example: Directions here: https://support.google.com/merchants/answer/160589 Trying to insert: <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> This is the code: from xml.etree import ElementTree from xml.etree.ElementTree import Element, SubElement, Comment, tostring tree = ElementTree tree.register_namespace('xmlns:g', 'http://base.google.com/ns/1.0')

Find and replacing text in elementtree

梦想与她 提交于 2020-01-14 03:28:18
问题 i am very new to programming and python. I am trying to find and replace a text in an xml file. Here is my xml file <?xml version="1.0" encoding="UTF-8"?> <!--Arbortext, Inc., 1988-2008, v.4002--> <!DOCTYPE doc PUBLIC "-//MYCOMPANY//DTD XSEIF 1/FAD 110 05 R5//EN" "XSEIF_R5.dtd"> <doc version="XSEIF R5" xmlns="urn:x-mycompany:r2:reg-doc:1551-fad.110.05:en:*"> <meta-data></meta-data> <front></front> <body> <chl1><title xml:id="id_881i">Installation</title> <p>To install SDK, perform the tasks

Parsing RSS with Elementtree in Python

ⅰ亾dé卋堺 提交于 2020-01-13 09:04:57
问题 How do you search for namespace-specific tags in XML using Elementtree in Python? I have an XML/RSS document like: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.0/" > <channel> <title>sometitle</title> <pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate> <generator>http://wordpress.org/?v=2.5.1<