lxml | 易学教程

Python - 'ascii' codec can't decode byte \xbd in position

阅读更多关于 Python - 'ascii' codec can't decode byte \xbd in position

问题 I'm using LXML to scrape some text from webpages. Some of the text includes fractions. 5½ I need to get this into a float format. These fail: ugly_fraction.encode('utf-8') #doesn't change to usable format ugly_fraction.replace('\xbd', '') #throws error ugly_freaction.encode('utf-8').replace('\xbd', '') #throws error 回答1: unicodedata.numeric: Returns the numeric value assigned to the Unicode character unichr as float. If no such value is defined, default is returned, or, if not given,

How to install lxml for PyPy?

阅读更多关于 How to install lxml for PyPy?

问题 I've created a virtualenv for PyPy with: virtualenv test -p `which pypy` source test/bin/activate I installed the following dependencies: sudo apt-get install python-dev libxml2 libxml2-dev libxslt-dev And then I run: pip install --upgrade pypy As a result I get a lot of errors looking like this: src/lxml/lxml.etree.c:234038:22: error: `PyThreadState` {aka struct _ts}` has no member named `use_tracing` How do I properly install lxml for PyPy 2.6.0? 回答1: I used the following fork of lxml for

Creating xml from MySQL query with Python and lxml

阅读更多关于 Creating xml from MySQL query with Python and lxml

问题 I am trying to use Python and LXML to create an XML file from a Mysql query result. Here is the format I want. <DATA> <ROW> <FIELD1>content</FIELD1> <FIELD2>content</FIELD2> </ROW> </DATA> For some reason the code isn't formatting right and the XML will not validate. Here is that code from lxml import etree from lxml.etree import tostring from lxml.builder import E import MySQLdb try: conn = MySQLdb.connect(host = 'host',user = 'user',passwd = 'pass',db = 'db') cursor = conn.cursor() except:

python爬虫网页解析之lxml模块

阅读更多关于 python爬虫网页解析之lxml模块

python爬虫网页解析之lxml模块一.模块的安装 windows系统下的安装：方法一: pip3 install lxml 方法二:下载对应系统版本的wheel文件: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml pip3 install lxml-4.2.1-cp36-cp36m-win_amd64.whl #文件所在的路径 linux下安装：方法一: pip3 install lxml 方法二: yum install -y epel-release libxslt-devel libxml2-devel openssl-devel 推荐Python大牛在线分享技术扣qun：855408893 领域：web开发，爬虫，数据分析，数据挖掘，人工智能二.模块的使用 from lxml.html import etree 演示 import requests from lxml.html import etree rp = requests.get('http://www.baidu.com') html = etree.HTML(rp.text) #解析后的对象可以使用xpath进行内容匹配来源： CSDN 作者：学习-永无止境链接： https://blog.csdn.net/weixin_45974628

Cannot install lxml 3.3.3 on OSX 10.9 with buildout

阅读更多关于 Cannot install lxml 3.3.3 on OSX 10.9 with buildout

问题 Have seen numerous related posts but not had any luck getting this to work. Log shows: We have no distributions for lxml that satisfies 'lxml'. Getting distribution for 'lxml'. Running easy_install: "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python" "-c" "from setuptools.command.easy_install import main; main()" "-mZUNxd" "/Users/brad/Development/python/eggs/tmpLpoVC3" "-q" "/var/folders/3c/mdys56lx2wbf9rjlmqy4nhy40000gn/T/tmpZx0t7aget

builtins.TypeError: reading file objects must return plain strings : Error in Xpath - python

阅读更多关于 builtins.TypeError: reading file objects must return plain strings : Error in Xpath - python

问题 Here's my code : import os os.chdir('d:/py/xml/') from lxml import etree from io import StringIO #---------------------------------------------------------------------- def parseXML(xmlFile): """ Parse the xml """ f = open(xmlFile) xml = f.read() f.close() tree = etree.parse(StringIO(xml)) context = etree.iterparse(StringIO(xml)) for action, elem in context: if not elem.text: text = 'None' else: text = elem.text print (elem.tag + ' => ' + text) if __name__ == "__main__": parseXML("example.xml

Word Breaks in text extraction , Lxml Xpath

阅读更多关于 Word Breaks in text extraction , Lxml Xpath

问题 I want to extract words with strikethroughs i.e with the <w:delText> tag. I have used an expression and it extracts it successfully except that some words appear broken . For example the word "They" appears as 'T' and 'hey' . Given below is an xml sample where the problem persists: <w:delText xml:space="preserve">. </w:delText></w:r><w:r w:rsidR="0020338C" w:rsidDel="00147CFE"><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/><w:sz w:val="24"/></w:rPr><w:delText>T</w

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

阅读更多关于 Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

问题 I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError. This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly? 回答1: If you look at this

Installing lxml, libxml2, libxslt for Python 3.5 on Windows 10

阅读更多关于 Installing lxml, libxml2, libxslt for Python 3.5 on Windows 10

问题 I first try to run the basic pip install command for it: C:\Program Files (x86)\Python35-32>pip install lxml Collecting lxml Using cached lxml-3.6.4.tar.gz Building wheels for collected packages: lxml Running setup.py bdist_wheel for lxml ... error Complete output from command "c:\program files (x86)\python35-32\python.exe" -u -c "import setuptools, tokenize;__file__='C:\\Users\\Djidiouf\\AppData\\Local\\Temp\\pip-build-ovqa6ncd\\lxml\\setup.py';f=getattr(tokenize, 'open', open)(__file__)

html parsing with lxml when there's no root tag

阅读更多关于 html parsing with lxml when there's no root tag

问题 I've been building a scaffolding library for sqlalchemy using lxml and formalchemy, and I'm having a hard time getting them to play nicely. specifically, formalchemy.FieldSet.render() returns a fragment of html with no root tag, and I cannot seem to figure out how to get lxml to parse it into something that can be included into an element tree: what I get: >>> lxml.etree.fromstring(formalchemy.FieldSet(toyschema.User(), session).render()) Traceback (most recent call last): File "<stdin>",