lxml | 易学教程

Python lxml.html XPath “attribute not equal” operator not working as expected

阅读更多关于 Python lxml.html XPath “attribute not equal” operator not working as expected

问题 I'm trying to run the following script: #!python from urllib import urlopen #urllib.request for python3 from lxml import html url = 'http://mpk.lodz.pl/rozklady/1_11_D2D3/00d2/00d2t001.htm?r=KOZINY'+\ '%20-%20Srebrzy%F1ska,%20Cmentarna,%20Legion%F3w,%20pl.%20Wolno%B6ci'+\ ',%20Pomorska,%20Kili%F1skiego,%20Przybyszewskiego%20-%20LODOWA' raw_html = urlopen(url).read() tree = html.fromstring(raw_html) #need to .decode('windows-1250') in python3 ret = tree.xpath('//td [@class!="naglczas"]') print

lxml and loops to create xml rss in python

阅读更多关于 lxml and loops to create xml rss in python

I have been using lxml to create the xml of rss feed. But I am having trouble with the tags and cant really figure out how to to add a dynamic number of elements. Given that lxml seems to just have functions as parameters of functions, I cant seem to figure out how to loop for a dynamic number of items without remaking the entire page. rss = page = ( E.rss( E.channel( E.title("Page Title"), E.link(""), E.description(""), E.item( E.title("Hello!!!!!!!!!!!!!!!!!!!!! "), E.link("htt://"), E.description("this is a"), ), ) ) ) Jason has answered your question; but – just FYI – you can pass any

lxml and loops to create xml rss in python

阅读更多关于 lxml and loops to create xml rss in python

问题 I have been using lxml to create the xml of rss feed. But I am having trouble with the tags and cant really figure out how to to add a dynamic number of elements. Given that lxml seems to just have functions as parameters of functions, I cant seem to figure out how to loop for a dynamic number of items without remaking the entire page. rss = page = ( E.rss( E.channel( E.title("Page Title"), E.link(""), E.description(""), E.item( E.title("Hello!!!!!!!!!!!!!!!!!!!!! "), E.link("htt://"), E

XML pretty print fails in Python lxml

阅读更多关于 XML pretty print fails in Python lxml

I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6. My code: import lxml.etree as et fn_xml_in = 'in.xml' parser = et.XMLParser(remove_blank_text=True) xml_doc = et.parse(fn_xml_in, parser) xml_doc.getroot().find('b').append(et.Element('c')) xml_doc.write('out.xml', method='html', pretty_print=True) The input file in.xml looks like this: <a> </a> And the produced output file out.xml : <a> <c></c> </a> Or when I set remove_blank_text=True : <a><c></c></a> I would have expected lxml to insert line breaks and indentation within the b element: <a>

lxml unicode entity parse problems

阅读更多关于 lxml unicode entity parse problems

I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()? Edit: I had forgotten to include my DTD in the same folder - it's there now and has the following declaration: <!ENTITY eacute "é"> and is referred to (and always was) in xmldoc as so: <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE DScribeDatabase SYSTEM "foo.dtd"

lxml.etree insert elements into element.text

阅读更多关于 lxml.etree insert elements into element.text

I have strings that have empty xml elements in them, like this: >>> s = """fizz buzz <pb n="44"/> bananas""" These strings have been assigned to xml elements using the etree.SubElement method: >>> from lxml import etree as et >>> root = et.Element('root') >>> txt = et.SubElement(root, 'text') >>> txt.text = s >>> et.dump(root) <root> <text>fizz buzz <pb n="44"/> bananas</text> </root> Fiddling about with re.split() and etree's text and tail I can insert a subelement <pb n="44"/> where I want it in txt.text ; however, sometimes I've got multiple occurrences of the <pb/> element in the string,

lipo: can't figure out the architecture type of: /var/folders/

阅读更多关于 lipo: can't figure out the architecture type of: /var/folders/

I tried installing lxml on Mac OSX Snowleopard and keep getting the error: lipo: can't figure out the architecture type of: /var/folders/ I did install XCode with 10.4 SDK support and I changed gcc 4.2 to 4.0.1 Any clues??? Python 2.6.1 with Leopard 1.6.7.. running install running bdist_egg running egg_info writing src/lxml.egg-info/PKG-INFO writing top-level names to src/lxml.egg-info/top_level.txt writing dependency_links to src/lxml.egg-info/dependency_links.txt reading manifest file 'src/lxml.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching

Regular Expressions to parse template tags in XML

阅读更多关于 Regular Expressions to parse template tags in XML

I need to parse some XML to pull out embedded template tags for further parsing. I can't seem to bend Python's regular expressions to do what I want, though. In English: when a template tag is contained anywhere in the row, remove all the XML for that specific row and leave only the template tag in its place. I put together a test case to demonstrate. Here's the original XML:  <w:tbl> <w:tr> <w:tc><w:t>Header 1</w:t></w:tc> <w:tc><w:t>Header 2</w:t></w:tc> <w:tc><w:t>Header 3</w:t></w:tc> </w:tr> <w:tr> <w:tc><w:t>{% for i in items %}</w:t></w:tc> <w:tc><w:t></w:t></w

python lxml findall with multiple namespaces

阅读更多关于 python lxml findall with multiple namespaces

I'm trying to parse an XML document with multiple namespaces with lxml, and I'm stuck on getting the findall() method to return something. My XML: <MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd"> <HistoryRecords> <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId> <List> <HistoryRecord> <Value>60</Value> <State>Valid</State> <TimeStamp>2016-04-20T12:40:00Z</TimeStamp> </HistoryRecord> </List> <

lxml.etree._Element.append() from a loop not working as expected

阅读更多关于 lxml.etree._Element.append() from a loop not working as expected

I would like to know why in this code append() seems to work from inside the loop, but the resulting xml displays the modification from only the last iteration, while remove() works as expected. This is a overly simplified example, I'm working with big chunks of data, and need to append the same subtree to many different parents. from lxml import etree xml = etree.fromstring('<tree><fruit id="1"></fruit><fruit id="2"></fruit></tree>') sub = etree.fromstring('<apple/>') for i, item in enumerate(xml): item.append(sub) print('Fruit {} with sub appended: {}'.format( i, etree.tostring(item).decode(