lxml | 易学教程

python alexa result parsing with lxml.etree

阅读更多关于 python alexa result parsing with lxml.etree

问题 I am using alexa api from aws but I find difficult in parse the result to get what I want alexa api return an object tree <type 'lxml.etree._ElementTree'> I use this code to print the tree from lxml import etree root = tree.getroot() print etree.tostring(root) I get xml below <aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85<

Python create XML from Csv within a loop

阅读更多关于 Python create XML from Csv within a loop

问题 I am trying to create a xml file from a csv CSV: CatOne, CatTwo, CatThree ProdOne, ProdTwo, ProdThree ProductOne, ProductTwo, ProductThree Desired XML: <root> <prod> <CatOne>ProdOne</CatOne> <CatTwo>ProdTwo</CatTwo> <CatThree>ProdThree</CatThree> </prod> <prod> <CatOne>ProductOne</CatOne> <CatTwo>ProductTwo</CatTwo> <CatThree>ProductThree</CatThree> </prod> </root> Here is my code: #! usr/bin/python # -*- coding: utf-8 -*- import csv, sys, os from lxml import etree def main(): csvFile = 'test

Decode base64 string in python 3 (with lxml or not)

阅读更多关于 Decode base64 string in python 3 (with lxml or not)

问题 I know this looks embarrassingly easy, and I guess the problem is that I just don't have a clear understanding of all this bytes-str-unicode (and encoding-decoding , speaking frankly) stuff yet. I've been trying to get my working code to run on Python 3. The part I'm stuck with is when I parse an XML with lxml and decode a base64 string that is in that XML. The code now works in the following manner: I retrieve the binary data with an XPath query '.../binary/text()' . This produces a one

Decode base64 string in python 3 (with lxml or not)

阅读更多关于 Decode base64 string in python 3 (with lxml or not)

python - find xpath of element containing string

阅读更多关于 python - find xpath of element containing string

问题 I build a small script that supposed to find some specific string in a page and return the xpath of the element containing this string. The purpose is to use this xpath for finding string with same context. I'm using this code: import requests from lxml import html page = requests.get("http://www.w3schools.com/xpath/") tree = html.fromstring(page.text) result = tree.xpath('//*[. = "XML"]') result[0] returns <Element b at 0x7f034a08e940> and I can't figure out how to find this element's XPath

how to get the full contents of a node using xpath & lxml?

阅读更多关于 how to get the full contents of a node using xpath & lxml?

问题 I am using lxml's xpath function to retrieve parts of a webpage. I am trying to get contents of a <font> tag, which includes html tags of its own. If I use //td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"] I get the right amount of nodes, but they are returned as lxml objects ( <Element font at 0x101fe5eb0> ). If I use //td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text() I get exactly what I want, except that I don't get any

Writing with lxml emitting no whitespace even when pretty_print=True

阅读更多关于 Writing with lxml emitting no whitespace even when pretty_print=True

问题 I'm using the lxml library to read an xml template, insert/change some elements, and save the resulting xml. One of the elements which I'm creating on the fly using the etree.Element and etree.SubElement methods: tree = etree.parse(r'xml_archive\templates\metadata_template_pts.xml') root = tree.getroot() stream = [] for element in root.iter(): if isinstance(element.tag, basestring): stream.append(element.tag) # Find "keywords" element and insert a new "theme" element if element.tag ==

Using Python and lxml to validate XML against an external DTD

阅读更多关于 Using Python and lxml to validate XML against an external DTD

问题 I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd"> ...the rest of the document... I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time, I've thrown this together: enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.

Using Python and lxml to validate XML against an external DTD

阅读更多关于 Using Python and lxml to validate XML against an external DTD

How to parse malformed HTML in python

阅读更多关于 How to parse malformed HTML in python

问题 I need to browse the DOM tree of a parsed HTML document. I'm using uTidyLib before parsing the string with lxml a = tidy.parseString(html_code, options) dom = etree.fromstring(str(a)) sometimes I get an error, it seems that tidylib is not able to repair malformed html. how can I parse every HTML file without getting an error (parsing only some parts of files that can not be repaired)? 回答1: Beautiful Soup does a good job with invalid/broken HTML >>> from BeautifulSoup import BeautifulSoup >>>