lxml

How to get data off from a web page in selenium webdriver

对着背影说爱祢 提交于 2019-11-29 18:15:36
I want to fetch company name, email, phone number from this Link and put these contents in an excel file. I want to do the same for the all pages of the website. I have got the logic to fetch the the links in the browser and switch in between them. I'm unable to fetch the data from the website. Can anybody provide me an enhancement to the code i have written. Below is the code i have written: from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.keys import Keys import time from lxml import html import requests import xlwt

Iterparse big XML, with low memory footprint, and get all, even nested, Sequence Elements

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-29 18:03:26
I have written a small python script to parse XML data based on Liza Daly's blog in Python. However, my code does not parse all the nodes. So for example when a person has had multiple addresses then it takes only the first available address. The XML tree would look like this: - lgs - entities - entity - id - name - addressess - address - address1 - address - address1 - entity - id (...) and this would be the python script: import os import time from datetime import datetime import lxml.etree as ET import pandas as pd xml_file = '.\\FILE.XML' file_name, file_extension = os.path.splitext(os

SSL: CERTIFICATE_VERIFY_FAILED certificate verify failed

无人久伴 提交于 2019-11-29 17:04:00
from lxml import html import requests url = "https://website.com/" page = requests.get(url) tree = html.fromstring(page.content) page.content -> SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748) I run this script but I got this error. How can I do it? Since your URL is "an internal corporate URL" (as stated in comments), I'm guessing it uses a self-signed certificate, or is issued by a self-signed CA certificate. If that is in fact the case, you have two options: (1) provide the path to your corporate CA (including the complete chain of intermediate certificates

Python 3.4.0 — 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) — Unix 14.04

纵饮孤独 提交于 2019-11-29 16:08:36
Trying to retrieve some data from the web using urlib and lxml, I've got an error and have no idea, how to fix it. url='http://sum.in.ua/?swrd=автор' page = urllib.request.urlopen(url) The error itself: UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-15: ordinal not in range(128) I'm using Ukrainian in API this time, but when I use API (without any Ukrainian letters in it) here: url="http://www.toponymic-dictionary.in.ua/index.php?option=com_content&view=section&layout=blog&id=8&Itemid=9" page = urllib.request.urlopen(url) pageWritten = page.read() pageReady =

Python XML Remove Some Elements and Their Children but Keep Specific Elements and Their Children

感情迁移 提交于 2019-11-29 15:57:27
I have a very large .xml file and I am trying to make a new .xml file that just has a small part of this larger file's contents. I want to specify an attribute (in my case, an itemID) and give it a few specific values and then it would strip away all the elements except for the ones that have those itemIDs and their children. My large .xml file looks something like this: <?xml version='1.0' encoding='UTF-8'?> <api version="2"> <currentTime>2013-02-27 17:00:18</currentTime> <result> <rowset name="assets" key="itemID" columns="itemID,locationID,typeID,quantity,flag,singleton"> <row itemID=

XPathEvalError: Unregistered function for matches() in lxml

佐手、 提交于 2019-11-29 15:44:08
问题 i am trying to use the following xpath query in python from lxml.html.soupparser import fromstring root = fromstring(inString) nodes = root.xpath(".//p3[matches(.,'ABC')]//preceding::p2//p3") but it gives me the error nodes = root.xpath(".//p3[matches(.,'ABC')]//preceding::p2//p3") File "lxml.etree.pyx", line 1507, in lxml.etree._Element.xpath (src\lxml\lxml.etree.c:52198) File "xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src\lxml\lxml.etree.c:152124) File "xpath.pxi",

missing some text when iterating xml elements in python

喜你入骨 提交于 2019-11-29 15:18:41
I am running the following code in Python 2.7.3 on Mac OS X 10.6.8. import StringIO from lxml import etree f = open('./foo', 'r') doc = "" while 1: line = f.readline() doc += line if line == "": break tree = etree.parse(StringIO.StringIO(doc), etree.HTMLParser()) r = tree.xpath('//foo') for i in r: for j in i.iter(): print j.tag, j.text And the file foo contains <foo> AAA <bar> BBB </bar> XXX </foo> The output is foo AAA bar BBB Why am I not getting the text XXX ? How do I access it? Thanks Try this: from lxml import etree tree = etree.fromstring("<foo> AAA <bar> BBB </bar> XXX </foo>") foos =

How do I install lxml on Mac OS X 10.7.4? I have exhausted all options

泄露秘密 提交于 2019-11-29 14:40:38
问题 Tried various avenues but no luck. I am using a MBP with 10.7.4. I don't remember the last time I had so many problems installing anything with Python on my Mac. Please help me use lxml on my local machine and not rely on SVN commits, updates to remotely run on the Linux machine. $ sudo STATIC_DEPS=true /usr/bin/easy_install-2.7 lxml Password: Searching for lxml Reading http://pypi.python.org/simple/lxml/ Reading http://codespeak.net/lxml Best match: lxml 2.3.4 Downloading http://lxml.de

How to add a namespace to an attribute in lxml

核能气质少年 提交于 2019-11-29 13:15:43
I'm trying to create an xml entry that looks like this using python and lxml: <resource href="Unit 4.html" adlcp:scormtype="sco"> I'm using python and lxml. I'm having trouble with the adlcp:scormtype attribute. I'm new to xml so please correct me if I'm wrong. adlcp is a namespace and scormtype is an attribute that is defined in the adlcp namespace, right? I'm not even sure if this is the right question but... My question is, how do I add an attribute to an element from a non-default namespace using lxml? I apologize in advance if this is a trivial question. This is not a full reply but just

How to include the namespaces into a xml file using lxml?

心不动则不痛 提交于 2019-11-29 12:54:05
I am creating a new xml file from scratch using python and the lxml library. <route xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.xxxx" version="1.1" xmlns:stm="http://xxxx/1/0/0" xsi:schemaLocation="http://xxxx/1/0/0 stm_extensions.xsd"> I need to include this namespace information into the root tag as attributes of the route tag. I can´t include the information into the root declaration. from lxml import etree root = etree.Element("route", xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance", xmlns = "http://www.xxxxx", version = "1.1", xmlns: stm = "http://xxxxx