lxml

Get all children of specific node in Python

本秂侑毒 提交于 2020-01-25 07:30:28
问题 I have the following example.xml structure: <ParentOne> <SiblingOneA>This is Sibling One A</SiblingOneA> <SiblingTwoA> <ChildOneA>Value of child one A</ChildOneA> <ChildTwoA>Value of child two A</ChildTwoA> </SiblingTwoA> </ParentOne> <ParentTwo> <SiblingOneA>This is a different value for Sibling one A</SiblingOneA> <SiblingTwoA> <ChildOneA>This is a different value for Child one A</ChildOneA> <ChildTwoA>This is a different value for Child Two A</ChildTwoA> </SiblingTwoA> </ParentTwo>

Python lxml - returns null list

孤街浪徒 提交于 2020-01-25 05:58:02
问题 I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list? from lxml import html import requests test_url = 'SC312226' page = ('https://www.opencompany.co.uk/company/'+test_url) print 'Now searching URL: '+page data = requests.get(page) tree = html.fromstring(data.text) print tree.xpath('//title/text()'

Python lxml - returns null list

只谈情不闲聊 提交于 2020-01-25 05:57:16
问题 I cannot figure out what is wrong with the XPATH when trying to extract a value from a webpage table. The method seems correct as I can extract the page title and other attributes, but I cannot extract the third value, it always returns an empty list? from lxml import html import requests test_url = 'SC312226' page = ('https://www.opencompany.co.uk/company/'+test_url) print 'Now searching URL: '+page data = requests.get(page) tree = html.fromstring(data.text) print tree.xpath('//title/text()'

using beautifulsoup 4 for xml causes strange behaviour (memory issues?)

断了今生、忘了曾经 提交于 2020-01-25 03:37:30
问题 I'm getting strange behaviour with this >>> from bs4 import BeautifulSoup >>> smallfile = 'small.xml' #approx 600bytes >>> largerfile = 'larger.xml' #approx 2300 bytes >>> len(BeautifulSoup(open(smallfile, 'r'), ['lxml', 'xml'])) 1 >>> len(BeautifulSoup(open(largerfile, 'r'), ['lxml', 'xml'])) 0 Contents of small.xml: <?xml version="1.0" encoding="us-ascii"?> <Catalog> <CMoverMissile id="HunterSeekerMissile"> <MotionPhases index="1"> <Driver value="Guidance"/> <Acceleration value="3200"/>

python打包程序py2exe实战

六月ゝ 毕业季﹏ 提交于 2020-01-24 23:53:57
本文转载自: https://www.cnblogs.com/blueel/archive/2012/12/26/2834107.html 作者:blueel 转载请注明该声明。 最近在学python,所以用python写了个脚本,车位管理系统(嘿嘿,我在大学的时候用php做过一套系统,还获过奖呢) 但是这个程序现在还有太大的局限性,要使用就要先安装python环境比较麻烦 所以我就想先把程序打包发布,常用的就是py2exe打包,所以打算这次也用它。好了,开始。。。。 准备工作: 安装py2exe 编写setup.py代码如下: 1 #-*-coding: UTF-8-*- 2 from distutils.core import setup 3 import py2exe 4 # Powered by *** 5 INCLUDES = [] 6 options = {"py2exe" : 7 {"compressed" : 1, 8 "optimize" : 2, 9 "bundle_files" : 2, 10 "includes" : INCLUDES, 11 "dll_excludes": [ "MSVCP90.dll", "mswsock.dll", "powrprof.dll","w9xpopen.exe"] }} 12 setup( 13 options =

Decoding base64 image data using xslt

六月ゝ 毕业季﹏ 提交于 2020-01-24 19:35:09
问题 I am trying to have a single xml and at most one xsl stylesheet, the contents of the xml file are like below <catalogue> <item> <item_id>1234</item_id> <item_desc>hi-fi sanio</item_desc> <price>12.50</price> <image>iVBORw0KGgoAAAANSUhEUgAAANIAAAAzCAYAAADigVZlAAA</image> </item> <item> <item_id>4614</item_id> <item_desc>lace work</item_desc> <price>1.50</price> <image>QN0lEQVR4nO2dCXQTxxnHl0LT5jVteHlN+5q+JCKBJITLmHIfKzBHHCCYBAiEw</image> </item> <item> <item_id>614</item_id> <item_desc>bicycle

Decoding base64 image data using xslt

偶尔善良 提交于 2020-01-24 19:35:05
问题 I am trying to have a single xml and at most one xsl stylesheet, the contents of the xml file are like below <catalogue> <item> <item_id>1234</item_id> <item_desc>hi-fi sanio</item_desc> <price>12.50</price> <image>iVBORw0KGgoAAAANSUhEUgAAANIAAAAzCAYAAADigVZlAAA</image> </item> <item> <item_id>4614</item_id> <item_desc>lace work</item_desc> <price>1.50</price> <image>QN0lEQVR4nO2dCXQTxxnHl0LT5jVteHlN+5q+JCKBJITLmHIfKzBHHCCYBAiEw</image> </item> <item> <item_id>614</item_id> <item_desc>bicycle

lxml cannot import element etree

一笑奈何 提交于 2020-01-24 09:11:10
问题 I downloaded the source of lxml. then unpacked it typed "python setup.py install". ==> everything went fine. but now typing : import lxml i get: Traceback (most recent call last): File "lxml.py", line 1, in <module> from lxml import etree File "/Soft/fox_dev/dev/ut1u3h/dev/lxml.py", line 1, in <module> from lxml import etree ImportError: cannot import name etree Before you offer me any solution, I must tell you, our sys admins and network admins are control/Sercurity freaks. I am not root I

Python数据采集常见的三种爬虫语法------Xpath篇

拟墨画扇 提交于 2020-01-24 07:49:54
在讲Xpath语法之前,首先我们需要了解一下Lxml库,要不然就算我们知道语法了,没有库的支持一切都是白搭,废话不多说,直接进入主题。 1、Lxml库 Lxml库的基本概念: Lxml是Python中的一个解析库,支持Xpath语法解析方式,可以用来解析Xml结构,由于Html结构和Xml结构大致相似都是树形结构,所以Lxml也可以解析Html。 Lxml库的常见模块:Etree 我先来谈谈我对这个库的认识,Etree库的作用是对爬取出来的Html页面进行初始化操作,下面简单列举一下Etree模块的用法: 1、 文本转换成HTML对象 #HTML方法 html = etree.HTML ( text ) 2、 将对象转成html文本 html = etree.HTML ( text ) result = etree.tostring ( html ) 3、 解析页面并返还Html对象 html = etree.parse ( 'text.html' ) 当然这个模块下想必不止这一些方法,Lxml库下也不止着一些模块,这边由于本人能力有限不能一一向大家介绍清楚,感兴趣的可以自己深入了解一下哈哈哈,这边草率地介绍一下爬虫解析页面时常用的三种方法。 Lxml安装: 命令行安装win+r,输入Cmd,进入终端模式(配好Python的环境变量) 2.在开发工具里面安装库包

Drop all namespaces in lxml?

房东的猫 提交于 2020-01-24 05:19:40
问题 I'm working with some of google's data APIs, using the lxml library in python. Namespaces are a huge hassle here. For a lot of the work I'm doing (xpath stuff, mainly), it would be nice to just plain ignore them. Is there a simple way to ignore xml namespaces in python/lxml? thanks! 回答1: If you'd like to remove all namespaces from elements and attributes, I suggest the code shown below. Context: In my application I'm obtaining XML representations of SOAP response streams, but I'm not