lxml

Pylint Error Message: “E1101: Module 'lxml.etree' has no 'strip_tags' member'”

天涯浪子 提交于 2019-12-04 11:12:05
问题 I am experimenting with lxml and python for the first time for a personal project, and I am attempting to strip tags from a bit of source code using etree.strip_tags(). For some reason, I keep getting the error message: "E1101: Module 'lxml.etree' has no 'strip_tags' member'". I'm not sure why this is happening. Here's the relevant portion of my code: from lxml import etree ... DOC = etree.strip_tags(DOC_URL, 'html') print DOC Any ideas? Thanks. 回答1: The reason for this is that pylint by

lxml use namespace instead of ns0, ns1,

北慕城南 提交于 2019-12-04 10:13:51
I have just started with lxml basics and I am stuck with namespaces: I need to generate an xml like this: <CityModel xmlns:bldg="http://www.opengis.net/citygml/building/2.0" <cityObjectMember> <bldg:Building> <bldg:function>1000</bldg:function> </bldg:Building> </cityObjectMember> </CityModel> By using the following code: from lxml import etree cityModel = etree.Element("cityModel") cityObject = etree.SubElement(cityModel, "cityObjectMember") bldg = etree.SubElement(cityObject, "{http://schemas.opengis.net/citygml/building/2.0/building.xsd}bldg") function = etree.SubElement(bldg, "{bldg:

Python XML Parsing [duplicate]

一世执手 提交于 2019-12-04 09:26:51
问题 This question already has answers here : How do I parse XML in Python? (15 answers) Closed 6 years ago . *Note: lxml will not run on my system. I was hoping to find a solution that does not involve lxml. I have gone through some of the documentation around here already, and am having difficulties getting this to work how I would like to. I would like to parse some XML file that looks like this: <dict> <key>1375</key> <dict> <key>Key 1</key><integer>1375</integer> <key>Key 2</key><string>Some

Python: adding xml schema attributes with lxml

南笙酒味 提交于 2019-12-04 09:02:45
I've written a script that prints out all the .xml files in the current directory in xml format, but I can't figure out how to add the xmlns attributes to the top-level tag. The output I want to get is: <?xml version='1.0' encoding='utf-8'?> <databaseChangeLog xmlns="http://www.host.org/xml/ns/dbchangelog" xmlns:xsi="http://www.host.org/2001/XMLSchema-instance" xsi:schemaLocation="www.host.org/xml/ns/dbchangelog"> <include file="cats.xml"/> <include file="dogs.xml"/> <include file="fish.xml"/> <include file="meerkats.xml"/> </databaseChangLog> However, here is the output I am getting: <?xml

Python lxml parsing svg file

做~自己de王妃 提交于 2019-12-04 07:34:37
I'm trying to parse .svg files from http://kanjivg.tagaini.net/ , but I can't successfully extract the information inside. Edit 1: (full file) http://www.filedropper.com/0f9ab A part of 0f9ab.svg looks like this: <svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109"> <g id="kvg:StrokePaths_0f9ab" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;"> <g id="kvg:0f9ab" kvg:element="嶺"> <g id="kvg:0f9ab-g1" kvg:element="山" kvg:position="top" kvg:radical="general"> <path id="kvg:0f9ab-s1" kvg:type="㇑a" d="M53.26,9.38c0.99,0.99

Using pyKML to parse KML Document

主宰稳场 提交于 2019-12-04 07:24:39
I'm using the pyKML module for extracting coordinates from a given KML file. My Python code is as follows: from pykml import parser fileobject = parser.fromstring(open('MapSource.kml', 'r').read()) root = parser.parse(fileobject).getroot() print(xml.Document.Placemark.Point.coordinates) However, on running this, I get the following error: ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. Looking for solutions, I came across this solution http://twigstechtips.blogspot.in/2013/06/python-lxml-strings-with-encoding

Python : Beautiful Soup修改文档树

蹲街弑〆低调 提交于 2019-12-04 05:32:47
修改文档树 Beautiful Soup的强项是文档树的搜索,但同时也可以方便的修改文档树 修改tag的名称和属性 在 Attributes 的章节中已经介绍过这个功能,但是再看一遍也无妨. 重命名一个tag,改变属性的值,添加或删除属性: soup = BeautifulSoup(‘ Extremely bold ’) tag = soup.b tag.name = “blockquote” tag[‘class’] = ‘verybold’ tag[‘id’] = 1 tag Extremely bold del tag[‘class’] del tag[‘id’] tag Extremely bold 修改 .string 给tag的 .string 属性赋值,就相当于用当前的内容替代了原来的内容: markup = ‘ I linked to example.com ’ soup = BeautifulSoup(markup) tag = soup.a tag.string = “New link text.” tag New link text. 注意: 如果当前的tag包含了其它tag,那么给它的 .string 属性赋值会覆盖掉原有的所有内容包括子tag append() Tag.append() 方法想tag中添加内容,就好像Python的列表的 .append()

Python学习笔记——爬虫之BeautifulSoup4数据提取

こ雲淡風輕ζ 提交于 2019-12-04 05:31:26
目录 CSS 选择器:BeautifulSoup4 四大对象种类 1. Tag 2. NavigableString 3. BeautifulSoup 4. Comment 遍历文档树 1. 直接子节点 :.contents .children 属性 2. 所有子孙节点: .descendants 属性 3. 节点内容: .string 属性 搜索文档树 1.find_all(name, attrs, recursive, text, **kwargs) 2. CSS选择器 (1)通过标签名查找 (2)通过类名查找 (3)通过 id 名查找 (4)组合查找 (5)属性查找 (6) 获取内容 案例:使用BeautifuSoup4的爬虫 CSS 选择器:BeautifulSoup4 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据。 lxml 只会局部遍历,而Beautiful Soup 是基于HTML DOM的,会载入整个文档,解析整个DOM树,因此时间和内存开销都会大很多,所以性能要低于lxml。 BeautifulSoup 用来解析 HTML 比较简单,API非常人性化,支持 CSS选择器 、Python标准库中的HTML解析器,也支持 lxml 的 XML解析器。 Beautiful

Python中BeautifulSoup库的用法

不打扰是莪最后的温柔 提交于 2019-12-04 05:28:48
BeautifulSoup简介 Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据。官方解释如下: Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱,通过解析文档为用户提供需要抓取的数据,因为简单,所以不需要多少代码就可以写出一个完整的应用程序。 Beautiful Soup自动将输入文档转换为Unicode编码,输出文档转换为utf-8编码。你不需要考虑编码方式,除非文档没有指定一个编码方式,这时,Beautiful Soup就不能自动识别编码方式了。然后,你仅仅需要说明一下原始编码方式就可以了。 Beautiful Soup已成为和lxml、html6lib一样出色的python解释器,为用户灵活地提供不同的解析策略或强劲的速度。 BeautifulSoup支持Python标准库中的HTML解析器,还支持一些第三方的解析器,如果我们不安装它,则 Python 会使用 Python默认的解析器,lxml 解析器更加强大,速度更快,推荐使用lxml 解析器。 我们先看一个完整实例,BeautifulSoup 解析58同城网,里面主要用到BeautifulSoup 的select()方法: #encoding:UTF-8 from bs4 import BeautifulSoup

Python-- CSS 选择器:BeautifulSoup4

半腔热情 提交于 2019-12-04 05:28:38
目录 CSS 选择器:BeautifulSoup4 示例: 一、四大对象种类 1. Tag 2. NavigableString 3. BeautifulSoup 4. Comment 二、遍历文档树 1. 直接子节点 :.contents .children 属性 2. 所有子孙节点: .descendants 属性 3. 节点内容: .string 属性 三、搜索文档树 find_all(name, attrs, recursive, text, **kwargs) 四、CSS选择器 (1)通过标签名查找 (2)通过类名查找 (3)通过 id 名查找 (4)组合查找 (5)属性查找 (6)获取内容 CSS 选择器:BeautifulSoup4 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据。 lxml 只会局部遍历,而Beautiful Soup 是基于HTML DOM的,会载入整个文档,解析整个DOM树,因此时间和内存开销都会大很多,所以性能要低于lxml。 BeautifulSoup 用来解析 HTML 比较简单,API非常人性化,支持 CSS选择器 、Python标准库中的HTML解析器,也支持 lxml 的 XML解析器。 Beautiful Soup 3 目前已经停止开发