elementtree

Parsing very large HTML file with Python (ElementTree?)

强颜欢笑 提交于 2019-12-06 13:44:42
I asked about using BeautifulSoup to parse a very large (270MB) HTML file and getting a memory error andwas pointed toward ElementTree as a solution. I was trying to use their event-driven parsing, documented here . Testing it with the smaller settings file worked fine: >>> settings = open('S:\\Documents\\FacebookData\\html\\settings.htm') >>> for event, element in ET.iterparse(settings, events=("start", "end")): print("%5s, %4s, %s" % (event, element.tag, element.text)) Successfully prints out the elements. However, using that same code with 'messages.htm' instead of 'settings.htm' just to

Write xml with a path and value

自闭症网瘾萝莉.ら 提交于 2019-12-06 13:01:22
问题 I have a list of paths and values, something like this: [ {'Path': 'Item/Info/Name', 'Value': 'Body HD'}, {'Path': 'Item/Genres/Genre', 'Value': 'Action'}, ] And I want to build out the full xml structure, which would be: <Item> <Info> <Name>Body HD</Name> </Info> <Genres> <Genre>Action</Genre> </Genres> </Item> Is there a way to do this with lxml ? Or how could I build a function to fill in the inferred paths? 回答1: You could do something like: l = [ {'Path': 'Item/Info/Name', 'Value': 'Body

Is there a key for the default namespace when creating dictionary for use with xml.etree.ElementTree.findall() in Python?

别等时光非礼了梦想. 提交于 2019-12-06 10:55:12
I'm trying to parse an XML document with a default namespace, i.e. the root node has an xmlns attribute. This is annoying if you want to try find certain tags in the child nodes because each tag is prefixed with the default namespace. xml.etree.ElementTree.findall() allows for a namespaces dictionary to be passed in but I can't seem to find what the default namespace is mapped to. I have tried using 'default', None, 'xmlns' with no success. One option that does seem to work is to prefix the tag passed to findall() with 'xmlns:' (EDIT: this can be any arbitrary unique name actually) and a

Error importing a python module in Django

回眸只為那壹抹淺笑 提交于 2019-12-06 10:52:52
问题 In my Django project, the following line throws an ImportError: "No module named elementtree". from elementtree import ElementTree However, the module is installed (ie, I can run an interactive python shell, and type that exact line without any ImportError), and the directory containing the module is on the PYTHONPATH. But when I access any page in a browser, it somehow can't find the module, and throws the ImportError. What could be causing this? 回答1: Can you import elementtree within the

Python 3.4 : How to do xml validation

别来无恙 提交于 2019-12-06 09:18:06
I'm trying to do XML validation against some XSD in python. I was successful using lxml package. But the problem starts when I tried to port my code into python 3.4. I tried to install lxml for 3.4 version. Looks like my enterprise linux doesn't play very well with lxml. pip installation: pip install lxml Collecting lxml Downloading lxml-3.4.4.tar.gz (3.5MB) 100% |################################| 3.5MB 92kB/s Installing collected packages: lxml Running setup.py install for lxml Successfully installed lxml-3.4.4 After pip Installation : > python Python 3.4.1 (default, Nov 12 2014, 13:34:29)

The limit of Element Tree on xpath

 ̄綄美尐妖づ 提交于 2019-12-06 05:03:32
I've used Element Tree for a while and i love it because of its simplicity But I'm doubting of its implementation of x path This is the XML file <a> <b name="b1"></b> <b name="b2"><c/></b> <b name="b2"></b> <b name="b3"></b> </a> The python code import xml.etree.ElementTree as ET tree = ET.parse('test.xml') root = tree.getroot() root.findall("b[@name='b2' and c]") The program shows the error: invalid predicate But if I use root.findall("b[@name='b2']") or root.findall("b[c]") It works, ElementTree provides limited support for XPath expressions. The goal is to support a small subset of the

XML walking in python [closed]

余生颓废 提交于 2019-12-06 04:13:08
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I am new to python and would like to understand parsing xml. I have not been able to find any great examples or explanations of how to create a generic program to walk an XML nodeset. I want to be able to

How to read root XML tag in python

丶灬走出姿态 提交于 2019-12-06 03:54:14
My question follows on from another stackoverflow question:- "How to get the root node of an xml file in Python?" from xml.etree import ElementTree as ET path = 'C:\cool.xml' et = ET.parse ( path ) root = et.getroot() When I extract and print the root tag, I receive:- <Element 'root' at 0x1234abcd> I then want to check that the root tag has a certain title, how do I pull out just the tag name? If I try: if root == "root": print 'something' it doesn't work, so I assume I need to convert 'root' to text or something like that? I am very new to Python. root is an instance of the Element class. Any

How do you parse nested XML tags with python?

那年仲夏 提交于 2019-12-06 03:36:18
问题 Please excuse me if I'm using the wrong terminology, but here's what I'm trying to accomplish. I'm trying to pull attribute and text information from nested tags in such as alias, payment, amount, and etc... However my example code block is only able to pull info from and not anything from the subelements in . How do I go about using elementtree to try and get to the subelements of my subelements? Once please excuse my terminology if I'm using it incorrectly: ** Example XML block: ** <root>

How to parse xml in Python on Google App Engine

早过忘川 提交于 2019-12-06 03:35:36
问题 For this following xml, how do I fetch the xml and then parse it to get out the value for <age> ? <boardgames> <boardgame objectid="13"> <yearpublished>1995</yearpublished> <minplayers>3</minplayers> <maxplayers>4</maxplayers> <playingtime>90</playingtime> <age>10</age> <name sortindex="1">Catan</name> ... I'm currently trying: result = urlfetch.fetch(url=game_url) xml = ElementTree.fromstring(result.content) But I'm not sure I'm on the right path. When I try to parse I get errors (I think