lxml

Python - 'ascii' codec can't decode byte \xbd in position

左心房为你撑大大i 提交于 2019-12-14 01:05:48
问题 I'm using LXML to scrape some text from webpages. Some of the text includes fractions. 5½ I need to get this into a float format. These fail: ugly_fraction.encode('utf-8') #doesn't change to usable format ugly_fraction.replace('\xbd', '') #throws error ugly_freaction.encode('utf-8').replace('\xbd', '') #throws error 回答1: unicodedata.numeric: Returns the numeric value assigned to the Unicode character unichr as float. If no such value is defined, default is returned, or, if not given,

How to install lxml for PyPy?

a 夏天 提交于 2019-12-14 00:25:38
问题 I've created a virtualenv for PyPy with: virtualenv test -p `which pypy` source test/bin/activate I installed the following dependencies: sudo apt-get install python-dev libxml2 libxml2-dev libxslt-dev And then I run: pip install --upgrade pypy As a result I get a lot of errors looking like this: src/lxml/lxml.etree.c:234038:22: error: `PyThreadState` {aka struct _ts}` has no member named `use_tracing` How do I properly install lxml for PyPy 2.6.0? 回答1: I used the following fork of lxml for

Creating xml from MySQL query with Python and lxml

一笑奈何 提交于 2019-12-13 22:14:41
问题 I am trying to use Python and LXML to create an XML file from a Mysql query result. Here is the format I want. <DATA> <ROW> <FIELD1>content</FIELD1> <FIELD2>content</FIELD2> </ROW> </DATA> For some reason the code isn't formatting right and the XML will not validate. Here is that code from lxml import etree from lxml.etree import tostring from lxml.builder import E import MySQLdb try: conn = MySQLdb.connect(host = 'host',user = 'user',passwd = 'pass',db = 'db') cursor = conn.cursor() except:

python爬虫网页解析之lxml模块

核能气质少年 提交于 2019-12-13 21:23:14
python爬虫网页解析之lxml模块 一.模块的安装 windows系统下的安装: 方法一: pip3 install lxml 方法二:下载对应系统版本的wheel文件: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml pip3 install lxml-4.2.1-cp36-cp36m-win_amd64.whl #文件所在的路径 linux下安装: 方法一: pip3 install lxml 方法二: yum install -y epel-release libxslt-devel libxml2-devel openssl-devel 推荐Python大牛在线分享技术 扣qun:855408893 领域:web开发,爬虫,数据分析,数据挖掘,人工智能 二.模块的使用 from lxml.html import etree 演示 import requests from lxml.html import etree rp = requests.get('http://www.baidu.com') html = etree.HTML(rp.text) #解析后的对象可以使用xpath进行内容匹配 来源: CSDN 作者: 学习-永无止境 链接: https://blog.csdn.net/weixin_45974628

Cannot install lxml 3.3.3 on OSX 10.9 with buildout

ぐ巨炮叔叔 提交于 2019-12-13 21:19:22
问题 Have seen numerous related posts but not had any luck getting this to work. Log shows: We have no distributions for lxml that satisfies 'lxml'. Getting distribution for 'lxml'. Running easy_install: "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python" "-c" "from setuptools.command.easy_install import main; main()" "-mZUNxd" "/Users/brad/Development/python/eggs/tmpLpoVC3" "-q" "/var/folders/3c/mdys56lx2wbf9rjlmqy4nhy40000gn/T/tmpZx0t7aget

builtins.TypeError: reading file objects must return plain strings : Error in Xpath - python

限于喜欢 提交于 2019-12-13 20:48:25
问题 Here's my code : import os os.chdir('d:/py/xml/') from lxml import etree from io import StringIO #---------------------------------------------------------------------- def parseXML(xmlFile): """ Parse the xml """ f = open(xmlFile) xml = f.read() f.close() tree = etree.parse(StringIO(xml)) context = etree.iterparse(StringIO(xml)) for action, elem in context: if not elem.text: text = 'None' else: text = elem.text print (elem.tag + ' => ' + text) if __name__ == "__main__": parseXML("example.xml

Word Breaks in text extraction , Lxml Xpath

时间秒杀一切 提交于 2019-12-13 18:32:37
问题 I want to extract words with strikethroughs i.e with the <w:delText> tag. I have used an expression and it extracts it successfully except that some words appear broken . For example the word "They" appears as 'T' and 'hey' . Given below is an xml sample where the problem persists: <w:delText xml:space="preserve">. </w:delText></w:r><w:r w:rsidR="0020338C" w:rsidDel="00147CFE"><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/><w:sz w:val="24"/></w:rPr><w:delText>T</w

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

六月ゝ 毕业季﹏ 提交于 2019-12-13 18:13:02
问题 I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError. This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly? 回答1: If you look at this

Installing lxml, libxml2, libxslt for Python 3.5 on Windows 10

只愿长相守 提交于 2019-12-13 17:30:15
问题 I first try to run the basic pip install command for it: C:\Program Files (x86)\Python35-32>pip install lxml Collecting lxml Using cached lxml-3.6.4.tar.gz Building wheels for collected packages: lxml Running setup.py bdist_wheel for lxml ... error Complete output from command "c:\program files (x86)\python35-32\python.exe" -u -c "import setuptools, tokenize;__file__='C:\\Users\\Djidiouf\\AppData\\Local\\Temp\\pip-build-ovqa6ncd\\lxml\\setup.py';f=getattr(tokenize, 'open', open)(__file__)

html parsing with lxml when there's no root tag

走远了吗. 提交于 2019-12-13 16:27:14
问题 I've been building a scaffolding library for sqlalchemy using lxml and formalchemy, and I'm having a hard time getting them to play nicely. specifically, formalchemy.FieldSet.render() returns a fragment of html with no root tag, and I cannot seem to figure out how to get lxml to parse it into something that can be included into an element tree: what I get: >>> lxml.etree.fromstring(formalchemy.FieldSet(toyschema.User(), session).render()) Traceback (most recent call last): File "<stdin>",