lxml

How to find all guide IDs and pages with IMG tags in XML export with lxml/xpath?

不打扰是莪最后的温柔 提交于 2019-12-11 17:55:53
问题 How can I parse the below XML in order to find for each GUIDE, it's ID and UL, then for each PAGE inside GUIDE, the page ID and any images that appear inside BOXES / BOX / ASSETS / DESCRIPTION? The images are in HTML format so I need to grab the source from each image. <guide> <id></id> <url></url> <group> <id></id> <type></type> <name></name> </group> <pages> <page> <id></id> <name></name> <description></description> <boxes> <box> <id></id> <name></name> <type></type> <map_id></map_id>

lxml::etree::_ElementStringResult.getparent() works incorrectly

对着背影说爱祢 提交于 2019-12-11 16:34:15
问题 I did not find anyone explaining this error... I'm using lxml 3.1.0. When there is an HTML/XML like that: <h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1> the _ElementStringResult of string " XT 100 LV Ski Boots 2014" will be returned when we run: >> elemstr = tree.xpath('//body//h1/text()')[0] However, when we run as follows, we would get... >> parent = elemstr.getparent() >> tree.getpath(parent) /html/body/therestofthepath/h1/strong Did anyone have a problem

parse html content by passing custom date input

▼魔方 西西 提交于 2019-12-11 15:30:52
问题 I am parsing data from here. On the webpage I can get data for example yesterday by selecting the desired date. How can I parse to get the same data (ie. yesterday)? Like, pass custom dates to get data for that date. 回答1: You can either use Selenium or use the site's ajax api. Here is an example of the latter: def get_by_date(date): url = 'https://markets.ft.com/data/world/ajax/getnextecoevents?startDate=' + date r = requests.get(url) return r.json()['html'] date should be formatted as yyyy

Accessing !ENTITY statement and reference

淺唱寂寞╮ 提交于 2019-12-11 15:20:15
问题 I have some xml files with !ENTITY Definitions and &file_reference; And I can process these successfully. However I would like to preprocess the files and access the !ENTITY Definitions to extract file names and also the &file_references and which section of xml they are in An example XML file looks like <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE gdml [ <!ENTITY materials SYSTEM "materialsOptical.xml"> <!ENTITY solids_Mainz_v2 SYSTEM "solids_Mainz_v2.xml"> <!ENTITY matrices_Mainz_v2

Cx_freeze with lxml.html TypeError

六眼飞鱼酱① 提交于 2019-12-11 14:54:48
问题 import lxml.html Gives me error when i want to compile with cx_freeze: Traceback (most recent call last): File "C:\Python27\Scripts\cxfreeze", line 5, in <module> main() File "C:\Python27\lib\site-packages\cx_Freeze\main.py", line 188, in main freezer.Freeze() File "C:\Python27\lib\site-packages\cx_Freeze\freezer.py", line 572, in Freeze self._FreezeExecutable(executable) File "C:\Python27\lib\site-packages\cx_Freeze\freezer.py", line 186, in _FreezeExecutable exe.copyDependentFiles,

No nested nodes. How to get one piece of information and then to get additional info respectively?

强颜欢笑 提交于 2019-12-11 14:36:14
问题 For the code below I need to get dates and their times+hrefs+formats+...(not shown) respectively. <div class="showtimes"> <h2>The Little Prince</h2> <div class="poster" data-poster-url="http://www.test.com"> <img src="http://www.test.com"> </div> <div class="showstimes"> <div class="date">9 December, Wednesday</div> <span class="show-time techno-3d"> <a href="http://www.test.com" class="link">12:30</a> <span class="show-format">3D</span> </span> <span class="show-time techno-3d"> <a href=

How to add an attribute to a tag found using xpath in lxml in Python?

吃可爱长大的小学妹 提交于 2019-12-11 14:27:13
问题 I have the following xml - <draw:image></draw:image> I want to add multiple xlink attributes to it and make it - <draw:image xlink:href="image" xlink:show="embed"></draw:image> I tried using the following code but got the error "ValueError: Invalid attribute name u'xlink:href'" root.xpath("//draw:image", namespaces= {"draw":"urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"}) [0].attrib['xlink:href'] = 'image' What am I doing wrong? There seems to be something related to namespaces, but I

parsing xml by python lxml tree.xpath

烈酒焚心 提交于 2019-12-11 14:07:15
问题 I try to parse a huge file. The sample is below. I try to take <Name> , but I can't It works only without this string <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> xml2 = '''<?xml version="1.0" encoding="UTF-8"?> <PackageLevelLayout> <LevelLayouts> <LevelLayout levelGuid="4a54f032-325e-4988-8621-2cb7b49d8432"> <LevelLayout xmlns="http://schemas.datacontract.org/2004/07/ArcherTech.Common.Domain" xmlns

Python - Same xpath in selenium and lxml different results

你说的曾经没有我的故事 提交于 2019-12-11 14:02:22
问题 I have this site http://www.google-proxy.net/ and i need to get first proxy's ip:port. br = webdriver.Firefox() br.get("http://www.google-proxy.net/") ip = br.find_element_by_xpath("//tr[@class='odd']/td[1]").text; time.sleep(random.uniform(1, 1)) port = br.find_element_by_xpath("//tr[@class='odd']/td[2]").text; time.sleep(random.uniform(1, 1)) and it works fine. But now i want to do the same with lxml page = requests.get(proxy_server) root = lxml.html.fromstring(page.text) ip = root.xpath("/

XSLT 1.0: max value of a date node

别来无恙 提交于 2019-12-11 13:11:59
问题 Given following xml: <Ergebnisse> <Spiel> <Datum>2013-10-02</Datum> </Spiel> <Spiel> <Datum>2013-10-03</Datum> </Spiel> <Spiel> <Datum>2013-10-03</Datum> </Spiel> <Spiel> <Datum>2013-10-03</Datum> </Spiel> <Spiel> <Datum>2013-10-06</Datum> </Spiel> <Spiel> <Datum>2013-10-06</Datum> </Spiel> <Spiel> <Datum>2013-10-06</Datum> </Spiel> <Spiel> <Datum>2013-10-06</Datum> </Spiel> <Spiel> <Datum>2014-05-01</Datum> </Spiel> <Spiel> <Datum>2014-05-01</Datum> </Spiel> <Spiel> <Datum>2014-04-27</Datum>