lxml

How do I match contents of an element in XPath (lxml)?

我只是一个虾纸丫 提交于 2019-12-30 02:13:26
问题 I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag: For example given the <a href="http://something">Example</a> element I can match the href attribute using .//a[@href='http://something'] but the given the expression .//a[.='Example'] or even .//a[contains(.,'Example')] lxml throws the 'invalid node predicate' exception. What am I doing wrong? EDIT: Example code: from lxml import etree from cStringIO import StringIO html = '<a href="http:

Remove all javascript tags and style tags from html with python and the lxml module

北慕城南 提交于 2019-12-29 11:34:26
问题 I am parsing an html document using the http://lxml.de/ library. So far I have figured out how to strip tags from an html document In lxml, how do I remove a tag but retain all contents? but the method described in that post leaves all the text, stripping the tags with out removing the actual script. I have also found a class reference to lxml.html.clean.Cleaner http://lxml.de/api/lxml.html.clean.Cleaner-class.html but this is clear as mud as to how to actually use the class to clean the

Entity references and lxml

浪子不回头ぞ 提交于 2019-12-29 07:34:12
问题 Here's the code I have: from cStringIO import StringIO from lxml import etree xml = StringIO('''<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE root [ <!ENTITY test "This is a test"> ]> <root> <sub>&test;</sub> </root>''') d1 = etree.parse(xml) print '%r' % d1.find('/sub').text parser = etree.XMLParser(resolve_entities=False) d2 = etree.parse(xml, parser=parser) print '%r' % d2.find('/sub').text Here's the output: 'This is a test' None How do I get lxml to give me '&test;' , i.e., the raw

pip install lxml mysql-python error

送分小仙女□ 提交于 2019-12-29 05:42:33
问题0: 在安装 mysql-python时,会出现: sh: mysql_config: not found Traceback (most recent call last): File "setup.py", line 15, in <module> metadata, options = get_config() File "/home/zhxia/apps/source/MySQL-python-1.2.3/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/home/zhxia/apps/source/MySQL-python-1.2.3/setup_posix.py", line 24, in mysql_config raise EnvironmentError("%s not found" % (mysql_config.path,)) EnvironmentError: mysql_config not found 只要原因是没有安装:libmysqlclient-dev sudo apt-get install libmysqlclient-dev 找到mysql_config文件的路径 sudo updatedb locate mysql_config

lxml: add namespace to input file

不羁的心 提交于 2019-12-28 04:24:06
问题 I am parsing an xml file generated by an external program. I would then like to add custom annotations to this file, using my own namespace. My input looks as below: <sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4"> <model metaid="untitled" id="untitled"> <annotation>...</annotation> <listOfUnitDefinitions>...</listOfUnitDefinitions> <listOfCompartments>...</listOfCompartments> <listOfSpecies> <species

libxml install error using pip

一笑奈何 提交于 2019-12-27 10:28:14
问题 This is my error: (mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install lxml Downloading/unpacking lxml Running setup.py egg_info for package lxml Building lxml version 2.3. Building without Cython. ERROR: /bin/sh: xslt-config: not found ** make sure the development packages of libxml2 and libxslt are installed ** Using build configuration of libxslt Installing collected packages: lxml Running setup.py install for lxml Building lxml version 2.3. Building without Cython. ERROR: /bin

Python爬虫之lxml-etree和xpath的结合使用

喜夏-厌秋 提交于 2019-12-26 17:10:52
本篇文章给大家介绍的是Python爬虫之lxml-etree和xpath的结合使用(附案例),内容很详细,希望可以帮助到大家。 lxml:python的HTML / XML的解析器 官网文档: https://lxml.de/ 使用前需要安装lxml包 终端输入(win7.8,10在cmd输入)pip install -i https://pypi.tuna.tsinghua.edu.cn/simple lxml 功能: 1 解析html:使用etree.html(text)将字符串格式的 html片段解析成 html 文档 2 读取xml文件 3 etree和xpath配合使用(本文主要介绍) 示例:etree和xpath配合使用 # lxml-etree读取文件from lxml import etree xml = etree.parse("./py24.xml") print(type(xml))# 查找所有 book 节点rst = xml.xpath('//book') print(type(rst)) print(rst)# 查找带有 category 属性值为 sport 的元素rst2 = xml.xpath('//book[@category="sport"]') print(type(rst2)) print(rst2)#

Pip install failed in openshift 3

久未见 提交于 2019-12-25 18:31:13
问题 I want to use the new platform Openshift 3 but I can't install lxml for Weblate with pip when build process is launch. In logs the last line is " Running setup.py install for lxml " but no more error How can I found what happened ? Thanks 回答1: Some of the packages around data analytics when compiled with compiler optimisations can chew up too much memory and hit the default memory limit for builds. Try following steps outlined in: Pandas on OpenShift v3 Is less likely, but just in case is the

Replace `\n` in html page with space in python LXML

本小妞迷上赌 提交于 2019-12-25 17:26:06
问题 I have an unclear xml and process it with python lxml module. I want replace all \n in content with space before any processing, how can I do this work for text of all elements. edit my xml example: <root> <a> dsdfs\n dsf\n sdf\n</a> <bds> <d>sdf\n\n\n\n\n\n</d> <d>sdf\n\n\nsdf\nsdf\n\n</d> </bds> .... .... .... .... </root> and i wan't to get this in output when i print ittertext: root = #get root element for i in root.ittertext(): print i dsdfs dsf sdf dsdfs dsf sdf sdf nsdf sdf 回答1: Below

Generate xml documents using lxml and vary element text and attributes based on logic

和自甴很熟 提交于 2019-12-25 16:56:21
问题 I have my lxml code like this from lxml import etree import sys fd = open('D:\\text.xml', 'wb') xmlns = "http://www.fpml.org/FpML-5/confirmation" xsi = "http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="http://www.fpml.org/FpML-5/confirmation ../../fpml-main-5-6.xsd http://www.w3.org/2000/09/xmldsig# ../../xmldsig-core-schema.xsd" page = etree.Element("{"+xmlns+"}dataDocument",nsmap={None:xmlns,'xsi':xsi }) doc = etree.ElementTree(page) page.set("fpmlVersion", fpmlVersion) trade = etree