lxml

Python lxml.html XPath “attribute not equal” operator not working as expected

老子叫甜甜 提交于 2019-12-02 03:33:13
问题 I'm trying to run the following script: #!python from urllib import urlopen #urllib.request for python3 from lxml import html url = 'http://mpk.lodz.pl/rozklady/1_11_D2D3/00d2/00d2t001.htm?r=KOZINY'+\ '%20-%20Srebrzy%F1ska,%20Cmentarna,%20Legion%F3w,%20pl.%20Wolno%B6ci'+\ ',%20Pomorska,%20Kili%F1skiego,%20Przybyszewskiego%20-%20LODOWA' raw_html = urlopen(url).read() tree = html.fromstring(raw_html) #need to .decode('windows-1250') in python3 ret = tree.xpath('//td [@class!="naglczas"]') print

lxml and loops to create xml rss in python

筅森魡賤 提交于 2019-12-02 03:24:38
I have been using lxml to create the xml of rss feed. But I am having trouble with the tags and cant really figure out how to to add a dynamic number of elements. Given that lxml seems to just have functions as parameters of functions, I cant seem to figure out how to loop for a dynamic number of items without remaking the entire page. rss = page = ( E.rss( E.channel( E.title("Page Title"), E.link(""), E.description(""), E.item( E.title("Hello!!!!!!!!!!!!!!!!!!!!! "), E.link("htt://"), E.description("this is a"), ), ) ) ) Jason has answered your question; but – just FYI – you can pass any

lxml and loops to create xml rss in python

你说的曾经没有我的故事 提交于 2019-12-02 03:10:57
问题 I have been using lxml to create the xml of rss feed. But I am having trouble with the tags and cant really figure out how to to add a dynamic number of elements. Given that lxml seems to just have functions as parameters of functions, I cant seem to figure out how to loop for a dynamic number of items without remaking the entire page. rss = page = ( E.rss( E.channel( E.title("Page Title"), E.link(""), E.description(""), E.item( E.title("Hello!!!!!!!!!!!!!!!!!!!!! "), E.link("htt://"), E

XML pretty print fails in Python lxml

 ̄綄美尐妖づ 提交于 2019-12-02 02:28:55
I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6. My code: import lxml.etree as et fn_xml_in = 'in.xml' parser = et.XMLParser(remove_blank_text=True) xml_doc = et.parse(fn_xml_in, parser) xml_doc.getroot().find('b').append(et.Element('c')) xml_doc.write('out.xml', method='html', pretty_print=True) The input file in.xml looks like this: <a> <b/> </a> And the produced output file out.xml : <a> <b><c></c></b> </a> Or when I set remove_blank_text=True : <a><b><c></c></b></a> I would have expected lxml to insert line breaks and indentation within the b element: <a>

lxml unicode entity parse problems

僤鯓⒐⒋嵵緔 提交于 2019-12-02 02:20:08
I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()? Edit: I had forgotten to include my DTD in the same folder - it's there now and has the following declaration: <!ENTITY eacute "é"> and is referred to (and always was) in xmldoc as so: <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE DScribeDatabase SYSTEM "foo.dtd"

lxml.etree insert elements into element.text

▼魔方 西西 提交于 2019-12-02 02:04:08
I have strings that have empty xml elements in them, like this: >>> s = """fizz buzz <pb n="44"/> bananas""" These strings have been assigned to xml elements using the etree.SubElement method: >>> from lxml import etree as et >>> root = et.Element('root') >>> txt = et.SubElement(root, 'text') >>> txt.text = s >>> et.dump(root) <root> <text>fizz buzz <pb n="44"/> bananas</text> </root> Fiddling about with re.split() and etree's text and tail I can insert a subelement <pb n="44"/> where I want it in txt.text ; however, sometimes I've got multiple occurrences of the <pb/> element in the string,

lipo: can't figure out the architecture type of: /var/folders/

我们两清 提交于 2019-12-02 01:53:42
I tried installing lxml on Mac OSX Snowleopard and keep getting the error: lipo: can't figure out the architecture type of: /var/folders/ I did install XCode with 10.4 SDK support and I changed gcc 4.2 to 4.0.1 Any clues??? Python 2.6.1 with Leopard 1.6.7.. running install running bdist_egg running egg_info writing src/lxml.egg-info/PKG-INFO writing top-level names to src/lxml.egg-info/top_level.txt writing dependency_links to src/lxml.egg-info/dependency_links.txt reading manifest file 'src/lxml.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching

Regular Expressions to parse template tags in XML

╄→尐↘猪︶ㄣ 提交于 2019-12-01 22:55:46
I need to parse some XML to pull out embedded template tags for further parsing. I can't seem to bend Python's regular expressions to do what I want, though. In English: when a template tag is contained anywhere in the row, remove all the XML for that specific row and leave only the template tag in its place. I put together a test case to demonstrate. Here's the original XML: <!-- regex_trial.xml --> <w:tbl> <w:tr> <w:tc><w:t>Header 1</w:t></w:tc> <w:tc><w:t>Header 2</w:t></w:tc> <w:tc><w:t>Header 3</w:t></w:tc> </w:tr> <w:tr> <w:tc><w:t>{% for i in items %}</w:t></w:tc> <w:tc><w:t></w:t></w

python lxml findall with multiple namespaces

我的未来我决定 提交于 2019-12-01 21:42:37
I'm trying to parse an XML document with multiple namespaces with lxml, and I'm stuck on getting the findall() method to return something. My XML: <MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd"> <HistoryRecords> <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId> <List> <HistoryRecord> <Value>60</Value> <State>Valid</State> <TimeStamp>2016-04-20T12:40:00Z</TimeStamp> </HistoryRecord> </List> <

lxml.etree._Element.append() from a loop not working as expected

。_饼干妹妹 提交于 2019-12-01 21:11:22
I would like to know why in this code append() seems to work from inside the loop, but the resulting xml displays the modification from only the last iteration, while remove() works as expected. This is a overly simplified example, I'm working with big chunks of data, and need to append the same subtree to many different parents. from lxml import etree xml = etree.fromstring('<tree><fruit id="1"></fruit><fruit id="2"></fruit></tree>') sub = etree.fromstring('<apple/>') for i, item in enumerate(xml): item.append(sub) print('Fruit {} with sub appended: {}'.format( i, etree.tostring(item).decode(