lxml

Is there a way to disable urlencoding of anchor attributes in lxml

吃可爱长大的小学妹 提交于 2019-12-22 07:38:02
问题 I am using lxml 2.2.8 and trying to transform some existing html files into django templates. the only problem that i am having is that lxml urlencodes the anchor name and href attributes. for example: <xsl:template match="a"> <!-- anchor attribute href is urlencoded but the title is escaped --> <a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}"> <!-- name tag is urlencoded --> <xsl:attribute name="name">{{item.name}}</xsl:attribute> <!-- but other attributes are not --> <xsl

Replace text with HTML tag in LXML text element

本小妞迷上赌 提交于 2019-12-22 07:00:11
问题 I have some lxml element: >> lxml_element.text 'hello BREAK world' I need to replace the word BREAK with an HTML break tag— <br /> . I've tried to do simple text replacing: lxml_element.text.replace('BREAK', '<br />') but it inserts the tag with escaped symbols, like <br/> . How do I solve this problem? 回答1: Here's how you could do it. Setting up a sample lxml from your question: >>> import lxml >>> some_data = "<b>hello BREAK world</b>" >>> root = lxml.etree.fromstring(some_data) >>> root

ImportError: No module named lxml on Mac

女生的网名这么多〃 提交于 2019-12-22 06:52:11
问题 I am having a problem running a Python script and it is showing this message: ImportError: No module named lxml I suppose I have to install somewhat called lxml but I am really newbie to Python and I don't really have too much idea on that. I think I have two versions of Python installed on my Mac from what I have read in other threads, but I am not sure. How can I solve this issue? Python Version: 2.7.6 Mac OS X 10.9.2 回答1: I've installed recently using pip , but before it would all work, I

Validate with three xml schemas as one combined schema in lxml?

夙愿已清 提交于 2019-12-22 06:50:09
问题 I am generating an XML document for which different XSDs have been provided for different parts (which is to say, definitions for some elements are in certain files, definitions for others are in others). The XSD files do not refer to each other. The schemas are: http://xmlgw.companieshouse.gov.uk/v2-1/schema/Egov_ch-v2-0.xsd http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/FormSubmission-v1-1.xsd http://xmlgw.companieshouse.gov.uk/v1-1/schema/forms/CompanyIncorporation-v1-2.xsd Is there

Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1

廉价感情. 提交于 2019-12-22 06:31:48
问题 Friends, As a novice at best, I have not been able to figure this out given what is available in forums. Ultimately, all I want to do is take some simple xml files and convert them all to CSV in one go (though this code is just for one at a time). It looks to me like there are no official name spaces, but I'm not sure. I have this code (I used one header, 'SubmittingSystemVendor', but I really want to write all of them to CSV: import csv import lxml.etree x = r'C:\Users\...\jh944.xml' with

How to get an XPath from selenium webelement or from lxml?

不打扰是莪最后的温柔 提交于 2019-12-22 04:36:26
问题 I am using selenium and I need to find the XPaths of some selenium web elements. For example: import selenium.webdriver driver = selenium.webdriver.Firefox() element = driver.find_element_by_xpath(<some_xpath>) elements = element.find_elements_by_xpath(<some_relative_xpath>) for e in elements: print e.get_xpath() I know I can't get the XPath from the element itself, but is there a nice way to get it anyway? I tried using lxml to parse the HTML, but it doesn't recognize the XPath, <some_xpath>

LXML kills my CDATA sections

妖精的绣舞 提交于 2019-12-22 04:17:24
问题 I'm batch-converting a lot of XML files, changing their character encodings to UTF-8: with open(source_filename, "rb") as source: tree = etree.parse(source) with open(destination_filename, "wb") as destination: tree.write(destination, encoding="UTF-8", xml_declaration=True) Unfortunately, it is destroying my CDATA sections and just escaping them instead. Source : <d><![CDATA[áÌÀøÅàùÑÄéú ëÌÄé áÈàÅùÑ éäå''ä ðÄùÑÀôÌÈè <small><small>(ùí ëå èæ)</small></small> Destination : <d>בְּרֵאשִׁית כִּי בָאֵשׁ

Please help parse this html table using BeautifulSoup and lxml the pythonic way

懵懂的女人 提交于 2019-12-22 01:32:09
问题 I have searched a lot about BeautifulSoup and some suggested lxml as the future of BeautifulSoup while that makes sense, I am having a tough time parsing the following table from a whole list of tables on the webpage. I am interested in the three columns with varied number of rows depending on the page and the time it was checked. A BeautifulSoup and lxml solution is well appreciated. That way I can ask the admin to install lxml on the dev. machine. Desired output : Website Last Visited Last

How to handle adding elements and their parents using xpath

你。 提交于 2019-12-21 19:27:18
问题 Ok, I have a case where I need to add a tag to a certain other tag given an xpath. Example xml: <?xml version="1.0" encoding="UTF-8"?> <Assets> <asset name="Adham"> <general>> <services> <land/> <refuel/> </services> </general> </asset> <asset name="Test"> <general> <Something/> </general> </asset> </Assets> I want to add a <missions> tag to both assets. However, the second asset is missing the parent <services> tag, which I want to add. Each asset tag is stored in a variable (say node1,

Python: adding xml schema attributes with lxml

时间秒杀一切 提交于 2019-12-21 17:08:11
问题 I've written a script that prints out all the .xml files in the current directory in xml format, but I can't figure out how to add the xmlns attributes to the top-level tag. The output I want to get is: <?xml version='1.0' encoding='utf-8'?> <databaseChangeLog xmlns="http://www.host.org/xml/ns/dbchangelog" xmlns:xsi="http://www.host.org/2001/XMLSchema-instance" xsi:schemaLocation="www.host.org/xml/ns/dbchangelog"> <include file="cats.xml"/> <include file="dogs.xml"/> <include file="fish.xml"/