lxml

Python 3.4.0 — 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) — Unix 14.04

北战南征 提交于 2019-11-28 10:36:46
问题 Trying to retrieve some data from the web using urlib and lxml, I've got an error and have no idea, how to fix it. url='http://sum.in.ua/?swrd=автор' page = urllib.request.urlopen(url) The error itself: UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) I'm using Ukrainian in API this time, but when I use API (without any Ukrainian letters in it) here: url="http://www.toponymic-dictionary.in.ua/index.php?option=com_content&view=section

how to create a sub-element through variable in python 3.6.5

人走茶凉 提交于 2019-11-28 10:26:56
问题 My code is: import xml.etree.ElementTree as ET from lxml import etree var1 = '<name>This is my text</name>' page = etree.Element('first') doc = etree.ElementTree(page) second = etree.SubElement(page, 'second') second.text = var1 doc.write('a.xml', xml_declaration=True, encoding='utf-8') My output is: <?xml version='1.0' encoding='UTF-8'?> <first><second><name>This is my text</name></second></first> My Desired Output is: <?xml version='1.0' encoding='UTF-8'?> <first><second><name>This is my

can't installing lxml on Ubuntu 12.04

醉酒当歌 提交于 2019-11-28 10:11:50
I've been trying to install lxml using pip install lxml and I get the error below. I've used apt-get install python-dev libxml2 libxml2-dev libxslt-dev before (suggested in other answers) but I still get the same error. I did not use control-c. pip install lxml Downloading/unpacking lxml Downloading lxml-3.2.4.tar.gz (3.3MB): 3.3MB downloaded Running setup.py egg_info for package lxml /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url' warnings.warn(msg) Building lxml version 3.2.4. Building without Cython. Using build configuration of libxslt 1.1

Python XML Remove Some Elements and Their Children but Keep Specific Elements and Their Children

≡放荡痞女 提交于 2019-11-28 10:08:32
问题 I have a very large .xml file and I am trying to make a new .xml file that just has a small part of this larger file's contents. I want to specify an attribute (in my case, an itemID) and give it a few specific values and then it would strip away all the elements except for the ones that have those itemIDs and their children. My large .xml file looks something like this: <?xml version='1.0' encoding='UTF-8'?> <api version="2"> <currentTime>2013-02-27 17:00:18</currentTime> <result> <rowset

saving an 'lxml.etree._ElementTree' object

回眸只為那壹抹淺笑 提交于 2019-11-28 09:57:55
I've spent the last couple of days getting to grips with the basics of lxml; in particular using lxml.html to parse websites and create an ElementTree of the content. Ideally, I want to save the returned ElementTree so that I can load it up and experiment with it, without having to parse the website every time I modify my script. I assumed that pickling would be the way to go, however I'm now beginning to wonder. Although I am able to retrieve an ElementTree object after pickling... type(myObject) returns <class 'lxml.etree._ElementTree'> the object itself appears to be 'empty', since none of

How to install lxml on Windows

流过昼夜 提交于 2019-11-28 09:42:39
I'm trying to install lmxl on my Windows 8.1 laptop with Python 3.4 and failing miserably. First off, I tried the simple and obvious solution: pip install lxml . However, this didn't work. Here's what it said: Downloading/unpacking lxml Running setup.py (path:C:\Users\CARTE_~1\AppData\Local\Temp\pip_build_carte_000\lxml\setup.py) egg_info for package lxml Building lxml version 3.4.2. Building without Cython. ERROR: b"'xslt-config' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n" ** make sure the development packages of libxml2 and libxslt are

How to re-install lxml?

狂风中的少年 提交于 2019-11-28 09:06:33
I am using python 2,7.5 on mac 10.7.5, beautifulsoup 4.2.1. I am going to parse a xml page using the lxml library, as taught in the beautifulsoup tutorial. However, when I run my code, it shows bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml,xml. Do you need to install a parser library? I am sure that I already installed lxml by all methods: easy_install, pip, port, etc. I tried to add a line to my code to see if lxml is installed or not: import lxml Then python can just successfully go through this code and display the previous error message again,

How can I strip namespaces out of an lxml tree?

跟風遠走 提交于 2019-11-28 08:46:16
Following on from Removing child elements in XML using python ... Thanks to @Tichodroma, I have this code: If you can use lxml , try this: import lxml.etree tree = lxml.etree.parse("leg.xml") for dog in tree.xpath("//Leg1:Dog", namespaces={"Leg1": "http://what.not"}): parent = dog.xpath("..")[0] parent.remove(dog) parent.text = None tree.write("leg.out.xml") Now leg.out.xml looks like this: <?xml version="1.0"?> <Leg1:MOR xmlns:Leg1="http://what.not" oCount="7"> <Leg1:Order> <Leg1:CTemp id="FO"> <Leg1:Group bNum="001" cCount="4"/> <Leg1:Group bNum="002" cCount="4"/> </Leg1:CTemp> <Leg1:CTemp

【爬取练习】

梦想与她 提交于 2019-11-28 08:08:44
练习一:爬取iot的门户网站中环保管家页面内容: import requests from bs4 import BeautifulSoup url='http://www.ioteis.com/Stewardship.html' response_data=requests.get(url) response_data.encoding='utf-8' #把html页面进行解析 soup=BeautifulSoup(response_data.text,'lxml') #分析发现内容放在content1下面的div中 for hbgj in soup.select(".content1 "): title=hbgj.select("div.title1 a")[0].text content=hbgj.select('div.ptext')[0].text print("标题为:{},内容为:{}".format(title,content)) 来源: https://www.cnblogs.com/benpao1314/p/11401322.html

How to add a namespace to an attribute in lxml

a 夏天 提交于 2019-11-28 07:06:13
问题 I'm trying to create an xml entry that looks like this using python and lxml: <resource href="Unit 4.html" adlcp:scormtype="sco"> I'm using python and lxml. I'm having trouble with the adlcp:scormtype attribute. I'm new to xml so please correct me if I'm wrong. adlcp is a namespace and scormtype is an attribute that is defined in the adlcp namespace, right? I'm not even sure if this is the right question but... My question is, how do I add an attribute to an element from a non-default