lxml

python alexa result parsing with lxml.etree

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-31 04:26:33
问题 I am using alexa api from aws but I find difficult in parse the result to get what I want alexa api return an object tree <type 'lxml.etree._ElementTree'> I use this code to print the tree from lxml import etree root = tree.getroot() print etree.tostring(root) I get xml below <aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85<

Python create XML from Csv within a loop

你说的曾经没有我的故事 提交于 2019-12-31 04:26:08
问题 I am trying to create a xml file from a csv CSV: CatOne, CatTwo, CatThree ProdOne, ProdTwo, ProdThree ProductOne, ProductTwo, ProductThree Desired XML: <root> <prod> <CatOne>ProdOne</CatOne> <CatTwo>ProdTwo</CatTwo> <CatThree>ProdThree</CatThree> </prod> <prod> <CatOne>ProductOne</CatOne> <CatTwo>ProductTwo</CatTwo> <CatThree>ProductThree</CatThree> </prod> </root> Here is my code: #! usr/bin/python # -*- coding: utf-8 -*- import csv, sys, os from lxml import etree def main(): csvFile = 'test

Decode base64 string in python 3 (with lxml or not)

一世执手 提交于 2019-12-30 17:27:16
问题 I know this looks embarrassingly easy, and I guess the problem is that I just don't have a clear understanding of all this bytes-str-unicode (and encoding-decoding , speaking frankly) stuff yet. I've been trying to get my working code to run on Python 3. The part I'm stuck with is when I parse an XML with lxml and decode a base64 string that is in that XML. The code now works in the following manner: I retrieve the binary data with an XPath query '.../binary/text()' . This produces a one

Decode base64 string in python 3 (with lxml or not)

坚强是说给别人听的谎言 提交于 2019-12-30 17:26:13
问题 I know this looks embarrassingly easy, and I guess the problem is that I just don't have a clear understanding of all this bytes-str-unicode (and encoding-decoding , speaking frankly) stuff yet. I've been trying to get my working code to run on Python 3. The part I'm stuck with is when I parse an XML with lxml and decode a base64 string that is in that XML. The code now works in the following manner: I retrieve the binary data with an XPath query '.../binary/text()' . This produces a one

python - find xpath of element containing string

别说谁变了你拦得住时间么 提交于 2019-12-30 10:34:49
问题 I build a small script that supposed to find some specific string in a page and return the xpath of the element containing this string. The purpose is to use this xpath for finding string with same context. I'm using this code: import requests from lxml import html page = requests.get("http://www.w3schools.com/xpath/") tree = html.fromstring(page.text) result = tree.xpath('//*[. = "XML"]') result[0] returns <Element b at 0x7f034a08e940> and I can't figure out how to find this element's XPath

how to get the full contents of a node using xpath & lxml?

旧城冷巷雨未停 提交于 2019-12-30 09:42:11
问题 I am using lxml's xpath function to retrieve parts of a webpage. I am trying to get contents of a <font> tag, which includes html tags of its own. If I use //td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"] I get the right amount of nodes, but they are returned as lxml objects ( <Element font at 0x101fe5eb0> ). If I use //td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text() I get exactly what I want, except that I don't get any

Writing with lxml emitting no whitespace even when pretty_print=True

試著忘記壹切 提交于 2019-12-30 07:42:26
问题 I'm using the lxml library to read an xml template, insert/change some elements, and save the resulting xml. One of the elements which I'm creating on the fly using the etree.Element and etree.SubElement methods: tree = etree.parse(r'xml_archive\templates\metadata_template_pts.xml') root = tree.getroot() stream = [] for element in root.iter(): if isinstance(element.tag, basestring): stream.append(element.tag) # Find "keywords" element and insert a new "theme" element if element.tag ==

Using Python and lxml to validate XML against an external DTD

点点圈 提交于 2019-12-30 07:29:07
问题 I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd"> ...the rest of the document... I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time, I've thrown this together: enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.

Using Python and lxml to validate XML against an external DTD

邮差的信 提交于 2019-12-30 07:29:07
问题 I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd"> ...the rest of the document... I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time, I've thrown this together: enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.

How to parse malformed HTML in python

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-30 05:57:51
问题 I need to browse the DOM tree of a parsed HTML document. I'm using uTidyLib before parsing the string with lxml a = tidy.parseString(html_code, options) dom = etree.fromstring(str(a)) sometimes I get an error, it seems that tidylib is not able to repair malformed html. how can I parse every HTML file without getting an error (parsing only some parts of files that can not be repaired)? 回答1: Beautiful Soup does a good job with invalid/broken HTML >>> from BeautifulSoup import BeautifulSoup >>>