elementtree

Python: How to replace a character in a XML file with a new node?

对着背影说爱祢 提交于 2019-12-25 11:06:11
问题 I want to replace all instances of semicolon ":" in my node below with a new node "<colon/>" as shown below. I want this: <shortName>Trigger:Digital Edge:Source</shortName> to become like this: <shortName>Trigger<colon/>Digital Edge<colon/>Source</shortName> I have already tried using search and replace string, but when I get the output all the "< >" change to &lt and &gt . Can anyone please suggest any techniques to do this. Thank You 回答1: The idea is to get the node text, split it by colon

Manage quotation marks in XPath (lxml)

和自甴很熟 提交于 2019-12-25 08:34:42
问题 I want to extract web elements from the table 'MANUFACTURING AT A GLANCE' in the given website. But the name of the row has ' (single quote). This is interfering with my syntax. How do I overcome this issue? This code works for other rows. import requests from lxml import html, etree ism_pmi_url = 'https://www.instituteforsupplymanagement.org/ismreport/mfgrob.cfm?SSO=1' page = requests.get(ism_pmi_url) tree = html.fromstring(page.content) PMI_CustomerInventories = tree.xpath('//strong[text()=

ElementTree will not parse special characters with Python 2.7

安稳与你 提交于 2019-12-25 04:50:23
问题 I had to rewrite my python script from python 3 to python2 and after that I got problem parsing special characters with ElementTree. This is a piece of my xml: <account number="89890000" type="Kostnad" taxCode="597" vatCode="">Avsättning egenavgifter</account> This is the ouput when I parse this row: ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avs\xc3\xa4ttning egenavgifter') So it seems to be a problem with the character "ä". This is how i do it in the code: sys

How can one replace an element in lxml?

馋奶兔 提交于 2019-12-25 02:55:22
问题 I have a text that I get (data entered by users of CRM) web service, which returns a "terrifying format". I am filtering with python before using the data, but when it comes to removing line breaks (br) removed me also the texts. The code is as follows: description = ''' <div id="highlight" class="section"> <p> text............... </p> <br> <h1>TITLE</h1> <p>Multiple text <br>  </p> <ul> <li>bad layer....</li> </ul> <p> <br>subTitle </p> <p> </p> <p style="text-align: center;"> <br>Text1 <br

python read complex xml with ElementTree

谁都会走 提交于 2019-12-25 02:42:53
问题 I am trying to parse this xml file with python element tree: <?xml version="1.0" encoding="Windows-1250"?> <rsp:responsePack version="2.0" id="001" state="ok" note="" programVersion="9801.8 (19.5.2011)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:rdc="http://www.stormware.cz/schema/version_2/documentresponse.xsd" xmlns:typ="http://www.stormware.cz/schema/version_2/type.xsd" xmlns:lst="http://www.stormware.cz/schema/version_2/list.xsd" xmlns:lStk="http://www

Get attribute of complex element using lxml

£可爱£侵袭症+ 提交于 2019-12-25 02:42:31
问题 I have a simple file XML like below: <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" />BMW</brandName> <maxspeed> <value>250</value> <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" /> </maxspeed> I want to parse it using lxml and get the value of it: With brandName, it just need: 'brand_name' : m.findtext(NS+'brandName') If I want to get into abbrev attribute of it. 'brand_name' : m.findtext(NS+'brandName').attrib['abbrev'] With

Why does xml package modify my xml file in Python3?

寵の児 提交于 2019-12-25 01:53:16
问题 I use the xml library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file. Why is it modified? How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations. This is the example file <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <movie> <title>Der Eisbär</title> <ids> <entry> <key>tmdb</key> <value xsi:type="xs:int" xmlns:xs

How do solve IndexError: single positional indexer is out-of-bounds?

你说的曾经没有我的故事 提交于 2019-12-25 01:48:58
问题 My expected output is as follow: report_num|data 1 rr 1 r 1 a However, I have problem with 'Series' object has no attribute 'value' and index is out of bound.I also have tried change iloc[0] to loc[] or loc[:0].But didnt work.Hoping someone can help. def report_num(xml_file,df_cols): global df xtree = et.parse(xml_file) xroot = xtree.getroot() out_xml = pd.DataFrame(columns=df_cols) for node in xroot.findall('r:ReportHeader/r:Section[1]/r:Subreport/r:Details/r:Section[3]/r:Field',namespace):

rewrite ElementTree code in lxml

跟風遠走 提交于 2019-12-24 20:27:21
问题 I am writing a code to extract text from a xml file using ElementTree but I found out that lxml is giving xpath features which is more convenient. So i want to know how this line could be rewritten in lxml if x.nodeName == 'a:pPr' and x.getAttribute('lvl') == '2' and x.hasAttribute('marL') == False: currently I am suggested to use this.. '/p:sld/p:cSld/p:spTree/p:sp/p:nvSpPr/p:nvPr/x[@type="body" and @sz="quarter" and @marL]' Hope my question is clear! 回答1: I'm assuming you are already at a

Force ElementTree to use closing tag

六月ゝ 毕业季﹏ 提交于 2019-12-24 18:49:31
问题 Instead of having: <child name="George"/> at the XML file, I need to have: <child name="George"></child> An ugly workaround is to write a whitespace as text (not an empty string, as it will ignore it): import xml.etree.ElementTree as ET ch = ET.SubElement(parent, 'child') ch.set('name', 'George') ch.text = ' ' Then, since I am using Python 2.7, I read Python etree control empty tag format, and tried the html method, like so: ch = ET.tostring(ET.fromstring(ch), method='html') but this gave: