lxml

Why does lxml.html sometimes swallow/remove whitespace instead of preserving it?

橙三吉。 提交于 2020-01-03 05:27:07
问题 Given the following code, one might reasonably expect almost the exact same string of HTML that was fed into lxml to be to spit back out. from lxml import html HTML_TEST_STRING = r""" <pre> <em>abc</em> <em>def</em> <sub>ghi</sub> <sub>jkl</sub> <em>mno</em> <em>pqr</em> </pre> """ parser = html.HTMLParser( remove_blank_text=False ) doc = html.fromstring( HTML_TEST_STRING, parser=parser ) print( html_out_string ) Instead, even though everything is contained within a <pre> pre-formatted code

url请求,request请求,解析库beauitfulsoup,解析库lxml

混江龙づ霸主 提交于 2020-01-03 04:11:41
url请求 1 from urllib.request import urlopen 2 url="****" 3 respones = urlopen(url) 4 content = respones.read() 5 content = content.decode('utf-8') 6 print(content) request请求 1 import requests 2 url="***" 3 headers = {'Accept': '*/*', 4 'Accept-Language': 'en-US,en;q=0.8', 5 'Cache-Control': 'max-age=0', 6 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36', 7 'Connection': 'keep-alive', 8 'Referer': 'http://www.baidu.com/' 9 } 10 res = requests.get (url,headers=headers)#加headers头是为了伪装成浏览器浏览网页 11 print(res.status_code)#打印状态码

Parsing XML with Python - accessing elements

心已入冬 提交于 2020-01-02 19:14:56
问题 I'm using lxml to parse some xml, but for some reason I can't find a specific element. I'm trying to access the <Constant> elements. Here's an xml snippet: </rdf:Description> </rdf:RDF> </MiriamAnnotation> <ListOfSubstrates> <Substrate metabolite="Metabolite_5" stoichiometry="1"/> </ListOfSubstrates> <ListOfModifiers> <Modifier metabolite="Metabolite_9" stoichiometry="1"/> </ListOfModifiers> <ListOfConstants> <Constant key="Parameter_4344" name="Kcat" value="433.724"/> <Constant key=

Parsing XML with Python - accessing elements

♀尐吖头ヾ 提交于 2020-01-02 19:13:11
问题 I'm using lxml to parse some xml, but for some reason I can't find a specific element. I'm trying to access the <Constant> elements. Here's an xml snippet: </rdf:Description> </rdf:RDF> </MiriamAnnotation> <ListOfSubstrates> <Substrate metabolite="Metabolite_5" stoichiometry="1"/> </ListOfSubstrates> <ListOfModifiers> <Modifier metabolite="Metabolite_9" stoichiometry="1"/> </ListOfModifiers> <ListOfConstants> <Constant key="Parameter_4344" name="Kcat" value="433.724"/> <Constant key=

Parsing XML with Python - accessing elements

蓝咒 提交于 2020-01-02 19:13:01
问题 I'm using lxml to parse some xml, but for some reason I can't find a specific element. I'm trying to access the <Constant> elements. Here's an xml snippet: </rdf:Description> </rdf:RDF> </MiriamAnnotation> <ListOfSubstrates> <Substrate metabolite="Metabolite_5" stoichiometry="1"/> </ListOfSubstrates> <ListOfModifiers> <Modifier metabolite="Metabolite_9" stoichiometry="1"/> </ListOfModifiers> <ListOfConstants> <Constant key="Parameter_4344" name="Kcat" value="433.724"/> <Constant key=

PIP安装Python的scipy,scrapy等包出现“failed building wheel for xxx”问题解决办法

只谈情不闲聊 提交于 2020-01-02 11:53:39
本文转载自: https://www.cnblogs.com/harvey888/p/5467276.html 作者:harvey888 转载请注明该声明。 1.在这里下载对应的.whl文件,注意别改文件名! http://www. lfd.uci.edu/~gohlke/pyt honlibs/#lxml Ctrl + F,输入lxml,找到下面这段 Lxml, a binding for the libxml2 and libxslt libraries. lxml‑3.4.4‑cp27‑none‑win32.whl lxml‑3.4.4‑cp27‑none‑win_amd64.whl lxml‑3.4.4‑cp33‑none‑win32.whl lxml‑3.4.4‑cp33‑none‑win_amd64.whl lxml‑3.4.4‑cp34‑none‑win32.whl lxml‑3.4.4‑cp34‑none‑win_amd64.whl lxml‑3.4.4‑cp35‑none‑win32.whl lxml‑3.4.4‑cp35‑none‑win_amd64.whl cp后面是Python的版本号,27表示2.7,根据你的Python版本选择下载。 2.直接进入pip所在的目录\c:\python34\scripts 然后,把你要安装的whl文件都复制在这里啦。

python lxml inkscape namespace tags

≯℡__Kan透↙ 提交于 2020-01-02 05:57:56
问题 I am generating an SVG file that's intended to include Inkscape-specific tags. For example, inkscape:label and inkscape:groupmode . I am using lxml etree as my parser/generator. I'd like to add the label and groupmode tags to the following instance: layer = etree.SubElement(svg_instance, 'g', id="layer-id") My question is how do I achieve that in order to get the correct output form in the SVG, for example: <g inkscape:groupmode="layer" id="layer-id" inkscape:label="layer-label"> 回答1: First,

xpath: How do we select just the very last text node?

两盒软妹~` 提交于 2020-01-02 05:23:10
问题 How do I select the globally-last text node using xpath? I tried this, but it gives me the last node in every context of the document. lxml.html.fromstring('1<a>2<b>3</b>4<c>5</c>6</a>').xpath('//text()[last()]') ['1', '3', '5', '6'] I can do this, but it's inefficient in both time and space, especially as the document gets large. lxml.html.fromstring('1<a>2<b>3</b>4<c>5</c>6</a>').xpath('//text()[last()]')[-1] '6' I tried to use an index of -1, but that gives me an empty list. I tried to use

Fail to install lxml using pip

倾然丶 夕夏残阳落幕 提交于 2020-01-02 03:24:09
问题 This is the command I used to install lxml: sudo pip install lxml And I got the following message in the Cleaning Up stage: Cleaning up... Command /usr/bin/python -c "import setuptools, tokenize;__file__='/private/tmp/pip_build_root/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-rUFjFN-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /private

lxml: insert tag at a given position

你说的曾经没有我的故事 提交于 2020-01-02 02:26:08
问题 I have an xml file, similar to this: <tag attrib1='I'> <subtag1 subattrib1='1'> <subtext>text1</subtext> </subtag1> <subtag3 subattrib3='3'> <subtext>text3</subtext> </subtag3> </tag> I would like to insert a new subElement, so the result would be something like this <tag attrib1='I'> <subtag1 subattrib1='1'> <subtext>text1</subtext> </subtag1> <subtag2 subattrib2='2'> <subtext>text2</subtext> </subtag2> <subtag3 subattrib3='3'> <subtext>text3</subtext> </subtag3> </tag> I can append my xml