lxml

python lxml inkscape namespace tags

て烟熏妆下的殇ゞ 提交于 2019-12-05 12:10:29
I am generating an SVG file that's intended to include Inkscape-specific tags. For example, inkscape:label and inkscape:groupmode . I am using lxml etree as my parser/generator. I'd like to add the label and groupmode tags to the following instance: layer = etree.SubElement(svg_instance, 'g', id="layer-id") My question is how do I achieve that in order to get the correct output form in the SVG, for example: <g inkscape:groupmode="layer" id="layer-id" inkscape:label="layer-label"> First, remember that inkscape: isnt' a namespace, it's just a convenient way of referring to a namespace that is

Remove class attribute from HTML using Python and lxml

社会主义新天地 提交于 2019-12-05 11:19:18
问题 Question How do I remove class attributes from html using python and lxml? Example I have: <p class="DumbClass">Lorem ipsum dolor sit amet, consectetur adipisicing elit</p> I want: <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit</p> What I've tried so far I've checked out lxml.html.clean.Cleaner however, it does not have a method to strip out class attributes. You can set safe_attrs_only=True however, this does not remove the class attribute. Significant searching has turned up

Replace text with HTML tag in LXML text element

霸气de小男生 提交于 2019-12-05 09:31:19
I have some lxml element: >> lxml_element.text 'hello BREAK world' I need to replace the word BREAK with an HTML break tag— <br /> . I've tried to do simple text replacing: lxml_element.text.replace('BREAK', '<br />') but it inserts the tag with escaped symbols, like <br/> . How do I solve this problem? Here's how you could do it. Setting up a sample lxml from your question: >>> import lxml >>> some_data = "<b>hello BREAK world</b>" >>> root = lxml.etree.fromstring(some_data) >>> root <Element b at 0x3f35a50> >>> root.text 'hello BREAK world' Next, create a subelement tag <br>: >>> childbr =

How to extract links from a webpage using lxml, XPath and Python?

回眸只為那壹抹淺笑 提交于 2019-12-05 09:14:25
I've got this xpath query: /html/body//tbody/tr[*]/td[*]/a[@title]/@href It extracts all the links with the title attribute - and gives the href in FireFox's Xpath checker add-on . However, I cannot seem to use it with lxml . from lxml import etree parsedPage = etree.HTML(page) # Create parse tree from valid page. # Xpath query hyperlinks = parsedPage.xpath("/html/body//tbody/tr[*]/td[*]/a[@title]/@href") for x in hyperlinks: print x # Print links in <a> tags, containing the title attribute This produces no result from lxml (empty list). How would one grab the href text (link) of a hyperlink

python爬取网站视频保存到本地

爷,独闯天下 提交于 2019-12-05 08:58:20
前言 文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理。 作者: Woo_home PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取 http://note.youdao.co-m/noteshare?id=3054cce4add8a909e784ad934f956cef 安装库 该示例使用到的库有requests、lxml、re,其中re是python自带的,所以无需安装,只需安装requests和lxml库即可 安装命令如下: pip install requestspip install lxml 分析网页数据 打开一个视频网页如下: 右键进行开发者模式,点击一个视频右键,点击Open in new tab ok,可以打开 代码实现 先导入要使用的库 import requestsfrom lxml import etreeimport re 拿到网站的ur l 获取User-Agent 发起请求 筛选数据 遍历数据 匹 配数据 保存数据 下载的视频已经保存在文件夹中 . 来源: https://www.cnblogs.com/Qqun821460695/p/11917666.html

From escaped html -> to regular html? - Python

我的未来我决定 提交于 2019-12-05 08:23:10
I used BeautifulSoup to handle XML files that I have collected through a REST API. The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely. Unfortunately I need the HTML code. How would I go on about transforming the escaped HTML into proper markup? Help would be very much appreciated! I think you want xml.sax.saxutils.unescape from the Python standard library. E.g.: >>> from xml.sax import saxutils as su >>> s = '<foo>bar</foo>' >>> su.unescape(s) '<foo>bar</foo>' You could try the urllib module? It has a method unquote() that might suit your

Retrieve all contacts from gmail using python

故事扮演 提交于 2019-12-05 07:40:54
问题 I am using django social auth in order to retrieve contacts from gmail. I do not have any problem getting the authorization. I do a request and then I use lxml to retrieve the email addresses. The problem is that it does not display every contacts. For example, I can retrieve only 30 contacts while I have more than 300 contacts with my gmail account. Here is my view : def get_email_google(request): social = request.user.social_auth.get(provider='google-oauth2') url = 'https://www.google.com

How to use python to get google news headlines and search keywords?

孤人 提交于 2019-12-05 07:36:25
问题 I am working on a project to look through google news headlines and find keywords. I want it to: -put the headlines into a text file -remove commas, apostrophes, quotes, punctuation, etc -search keywords This is the code I have so far. I am getting the headlines, I now just need it to parse the keywords from each individual headline. from lxml import html import requests # Send request to get the web page response = requests.get('http://news.google.com') # Check if the request succeeded

xpath 的使用

允我心安 提交于 2019-12-05 07:05:48
模糊查询: def Sprider1(): pass url="http://juji123.net/tag/5_2.html" headers={ "User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" } request = requests.session() html = request.get(url,headers=headers).content.decode("utf-8") myhtml = etree.HTML(html)    用法: 标签[contains(@属性,“查询的内容”)] result = myhtml.xpath('//div[contains(@class,"jk")]//div[@class="summary"]') for rst in result: print(rst.text) if __name__ == '__main__': Sprider1() 什么是XML XML 指可扩展标记语言(EXtensible Markup Language) XML 是一种标记语言,很类似 HTML XML 的设计宗旨是传输数据,而非显示数据

Fail to install lxml using pip

你说的曾经没有我的故事 提交于 2019-12-05 06:36:13
This is the command I used to install lxml: sudo pip install lxml And I got the following message in the Cleaning Up stage: Cleaning up... Command /usr/bin/python -c "import setuptools, tokenize;__file__='/private/tmp/pip_build_root/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-rUFjFN-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /private/tmp/pip_build_root/lxml Storing debug log for failure in /Users/georgejor/Library/Logs/pip.log After that