lxml

Extracting nested namespace from a xml using lxml

不羁的心 提交于 2019-12-24 15:09:17
问题 I'm new to Python and currently learning to parse XML. All seems to be going well until I hit a wall with nested namespaces. Below is an snippet of my xml ( with a beginning and child element that I'm trying to parse: <?xml version="1.0" encoding="UTF-8"?> -<CompositionPlaylist xmlns="http://www.digicine.com/PROTO-ASDCP-CPL-20040511#"> <!-- Generated by orca_wrapping version 3.8.3-0 --> <Id>urn:uuid:e0e43007-ca9b-4ed8-97b9-3ac9b272be7a</Id> ------------- ------------- ------------- -<cc-cpl

How to find text's Parent Node?

白昼怎懂夜的黑 提交于 2019-12-24 14:33:19
问题 If I use: import requests from lxml import html response = request.get(url='someurl') tree = html.document_fromstring(response.text) all_text = tree.xpath('//text()') # which give all text from page Inside this all_text list we have all the text from page. Now I want to know if: text_searched = all_text[all_text.index('any string which is in all_text list')] Is it possible to get to the web element of the text been searched? 回答1: You can use getparent() method for this purpose, for example :

Passing lxml output to BeautifulSoup

核能气质少年 提交于 2019-12-24 13:33:15
问题 My offline code works fine but I'm having trouble passing a web page from urllib via lxml to BeautifulSoup. I'm using urllib for basic authentication then lxml to parse (it gives a good result with the specific pages we need to scrape) then to BeautifulSoup. #! /usr/bin/python import urllib.request import urllib.error from io import StringIO from bs4 import BeautifulSoup from lxml import etree from lxml import html file = open("sample.html") doc = file.read() parser = etree.HTMLParser() html

How to install libxml2 2.9.0 for lxml for Python 3.4.3 on win 7 64?

蓝咒 提交于 2019-12-24 12:34:47
问题 I'm using lxml 3.4.2 for Python 3.4 on a win 7 64 computer. I got lxml from http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml. One of its parts is libxml2 2.9.2. I'm having a problem that a user of lxml 3.4.2 with libxml2 2.9.0 is not having, so I'd like to try libxml2 2.9.0, but can't figure out how to install it. See Python 2 v. 3 xpath for more on the problem. I don't have the ability to compile from sources. How can I install 2.9.0? 回答1: To use custom versions, you will almost certainly

xpath lookup via lxml starting from root rather than element

空扰寡人 提交于 2019-12-24 12:23:58
问题 I want to do the same thing I do in beautiful soup, find_all elements and iterate through them to find some other_elements in each iterated elements. i.e.: soup = bs4.BeautifulSoup(source) articles = soup.find_all('div', class='v-card') for article in articles: name = article.find('span', itemprop='name').text address = article.find('p', itemprop='address').text Now I try to do the same thing in lxml: tree = html.fromstring(source) items = tree.xpath('//div[@class="v-card"]') for item in

How to delete duplicated elements in XML file

拈花ヽ惹草 提交于 2019-12-24 12:02:02
问题 Here is my XML file: it contains a duplicated element <houseNum>0</houseNum> . <?xml version="1.0" encoding="utf-8"?> <ArrayOfHouse> <XmlForm> <houseNum>0</houseNum> <plan1> <coord> <X> 1.2 </X> <Y> 2.1 </Y> <Z> 3.0 </Z> </coord> <color> <R> 255 </R> <G> 0 </G> <B> 0 </B> </color> </plan1> <plan2> <coord> <X> 21.2 </X> <Y> 22.1 </Y> <Z> 31.0 </Z> </coord> <color> <R> 255 </R> <G> 0 </G> <B> 0 </B> </color> </plan2> </XmlForm> <XmlForm> <houseNum>0</houseNum> <plan1> <coord> <X> 1.2 </X> <Y> 2

how to iterate through xml data to remove next duplicate element using lxml

柔情痞子 提交于 2019-12-24 11:36:25
问题 I am struggling to come up with a simple solution which iterates over xml data to remove the next element if it is a dplicate of the actual one. example: from this "input": <root> <b attrib1="abc" attrib2="def"> <c>data1</c> </b> <b attrib1="abc" attrib2="def"> <c>data2</c> </b> <b attrib1="uvw" attrib2="xyz"> <c>data3</c> </b> <b attrib1="abc" attrib2="def"> <c>data4</c> </b> <b attrib1="abc" attrib2="def"> <c>data5</c> </b> <b attrib1="abc" attrib2="def"> <c>data6</c> </b> </root> I would

lxml preserves attributes order?

自闭症网瘾萝莉.ら 提交于 2019-12-24 11:27:47
问题 I was writing my aplication using minidom but minidom does not preserve attribute order(sorts alphabetically), so I decided to do it using lxml. However in the following lines of code I'm not getting the desired order: import lxml.etree as ET SATNS = "link_1" NS = "link_2" location_attribute = '{%s}schemaLocation' % NS root = ET.Element('{%s}Catalogo' % SATNS, nsmap={'catalogocuentas':SATNS}, attrib= {location_attribute: 'http://www.sat.gob.mx/catalogocuentas'}, Ano="2014", Mes="02",

How do I add attributes to elements with ElementMaker?

百般思念 提交于 2019-12-24 09:37:02
问题 I have to generate an XML as below, <?xml version='1.0' encoding='UTF-8' standalone='yes'?> <serviceConfiguration xmlns="http://blah.com/serviceConfiguration"> <node name="node1"> <hostName>host1</hostName> <networkInterface name="eth0"> <ipv4Address>192.168.1.3</ipv4Address> <ipv6Address>2a00:4a00:a000:11a0::a4f:3</ipv6Address> <domainName>asdf.net</domainName> <ipv4Netmask>255.255.255.0</ipv4Netmask> <ipv6Netmask>ffff:ffff:ffff:ffff::</ipv6Netmask> </networkInterface> <userAccount> <uid

Xpath语法与lxml库的用法

十年热恋 提交于 2019-12-24 07:19:42
BeautifulSoup 已经是非常强大的库了,不过还有一些比较流行的解析库,例如 lxml,使用的是 Xpath 语法,同样是效率比较高的解析方法。 1.安装 pip install lxml 2. XPath语法 XPath 是一门在 XML 文档中查找信息的语言。XPath 可用来在 XML 文档中对元素和属性进行遍历。XPath 是 W3C XSLT 标准的主要元素,并且 XQuery 和 XPointer 都构建于 XPath 表达之上。 (1)选取节点: XPath 使用路径表达式在 XML 文档中选取节点。节点是通过沿着路径或者 step 来选取的。 下面列出了最有用的路径表达式: 表达式 描述 nodename 选取此节点的所有子节点。 / 从根节点选取。 // 从匹配选择的当前节点选择文档中的节点,而不考虑它们的位置。 . 选取当前节点。 .. 选取当前节点的父节点。 @ 选取属性。 实例 在下面的表格中,我们已列出了一些路径表达式以及表达式的结果: 路径表达式 结果 bookstore 选取 bookstore 元素的所有子节点。 /bookstore 选取根元素 bookstore。注释:假如路径起始于正斜杠( / ),则此路径始终代表到某元素的绝对路径! bookstore/book 选取属于 bookstore 的子元素的所有 book 元素。 /