lxml | 易学教程

Is there a way to disable urlencoding of anchor attributes in lxml

阅读更多关于 Is there a way to disable urlencoding of anchor attributes in lxml

问题 I am using lxml 2.2.8 and trying to transform some existing html files into django templates. the only problem that i am having is that lxml urlencodes the anchor name and href attributes. for example: <xsl:template match="a">  <a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}">  <xsl:attribute name="name">{{item.name}}</xsl:attribute>  <xsl

Replace text with HTML tag in LXML text element

阅读更多关于 Replace text with HTML tag in LXML text element

问题 I have some lxml element: >> lxml_element.text 'hello BREAK world' I need to replace the word BREAK with an HTML break tag— . I've tried to do simple text replacing: lxml_element.text.replace('BREAK', ' ') but it inserts the tag with escaped symbols, like . How do I solve this problem? 回答1: Here's how you could do it. Setting up a sample lxml from your question: >>> import lxml >>> some_data = "hello BREAK world" >>> root = lxml.etree.fromstring(some_data) >>> root

ImportError: No module named lxml on Mac

阅读更多关于 ImportError: No module named lxml on Mac

问题 I am having a problem running a Python script and it is showing this message: ImportError: No module named lxml I suppose I have to install somewhat called lxml but I am really newbie to Python and I don't really have too much idea on that. I think I have two versions of Python installed on my Mac from what I have read in other threads, but I am not sure. How can I solve this issue? Python Version: 2.7.6 Mac OS X 10.9.2 回答1: I've installed recently using pip , but before it would all work, I

Validate with three xml schemas as one combined schema in lxml?

阅读更多关于 Validate with three xml schemas as one combined schema in lxml?

问题 I am generating an XML document for which different XSDs have been provided for different parts (which is to say, definitions for some elements are in certain files, definitions for others are in others). The XSD files do not refer to each other. The schemas are: http://xmlgw.companieshouse.gov.uk/v2-1/schema/Egov_ch-v2-0.xsd http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/FormSubmission-v1-1.xsd http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/CompanyIncorporation-v1-2.xsd Is there

Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1

阅读更多关于 Python 3.4 lxml.etree: Start tag expected, '

问题 Friends, As a novice at best, I have not been able to figure this out given what is available in forums. Ultimately, all I want to do is take some simple xml files and convert them all to CSV in one go (though this code is just for one at a time). It looks to me like there are no official name spaces, but I'm not sure. I have this code (I used one header, 'SubmittingSystemVendor', but I really want to write all of them to CSV: import csv import lxml.etree x = r'C:\Users\...\jh944.xml' with

How to get an XPath from selenium webelement or from lxml?

阅读更多关于 How to get an XPath from selenium webelement or from lxml?

问题 I am using selenium and I need to find the XPaths of some selenium web elements. For example: import selenium.webdriver driver = selenium.webdriver.Firefox() element = driver.find_element_by_xpath(<some_xpath>) elements = element.find_elements_by_xpath(<some_relative_xpath>) for e in elements: print e.get_xpath() I know I can't get the XPath from the element itself, but is there a nice way to get it anyway? I tried using lxml to parse the HTML, but it doesn't recognize the XPath, <some_xpath>

LXML kills my CDATA sections

阅读更多关于 LXML kills my CDATA sections

问题 I'm batch-converting a lot of XML files, changing their character encodings to UTF-8: with open(source_filename, "rb") as source: tree = etree.parse(source) with open(destination_filename, "wb") as destination: tree.write(destination, encoding="UTF-8", xml_declaration=True) Unfortunately, it is destroying my CDATA sections and just escaping them instead. Source : <d><![CDATA[áÌÀøÅàùÑÄéú ëÌÄé áÈàÅùÑ éäå''ä ðÄùÑÀôÌÈè (ùí ëå èæ) Destination : <d>בְּרֵאשִׁית כִּי בָאֵשׁ

Please help parse this html table using BeautifulSoup and lxml the pythonic way

阅读更多关于 Please help parse this html table using BeautifulSoup and lxml the pythonic way

问题 I have searched a lot about BeautifulSoup and some suggested lxml as the future of BeautifulSoup while that makes sense, I am having a tough time parsing the following table from a whole list of tables on the webpage. I am interested in the three columns with varied number of rows depending on the page and the time it was checked. A BeautifulSoup and lxml solution is well appreciated. That way I can ask the admin to install lxml on the dev. machine. Desired output : Website Last Visited Last

How to handle adding elements and their parents using xpath

阅读更多关于 How to handle adding elements and their parents using xpath

问题 Ok, I have a case where I need to add a tag to a certain other tag given an xpath. Example xml: <?xml version="1.0" encoding="UTF-8"?> <Assets> <asset name="Adham"> <general>> <services> <land/> <refuel/> </services> </general> </asset> <asset name="Test"> <general> <Something/> </general> </asset> </Assets> I want to add a <missions> tag to both assets. However, the second asset is missing the parent <services> tag, which I want to add. Each asset tag is stored in a variable (say node1,

Python: adding xml schema attributes with lxml

阅读更多关于 Python: adding xml schema attributes with lxml

问题 I've written a script that prints out all the .xml files in the current directory in xml format, but I can't figure out how to add the xmlns attributes to the top-level tag. The output I want to get is: <?xml version='1.0' encoding='utf-8'?> <databaseChangeLog xmlns="http://www.host.org/xml/ns/dbchangelog" xmlns:xsi="http://www.host.org/2001/XMLSchema-instance" xsi:schemaLocation="www.host.org/xml/ns/dbchangelog"> <include file="cats.xml"/> <include file="dogs.xml"/> <include file="fish.xml"/