lxml | 易学教程

Merge two XML files by matching elements by attribute value

阅读更多关于 Merge two XML files by matching elements by attribute value

问题 I have two XML files that I'm trying to merge. I looked at other previous questions, but I don't feel like I can solve my problem from reading those. What I think makes my situation unique is that I have to find elements by attribute value and then merge to the opposite file. I have two files. One is an English translation catalog and the second is a Japanese translation catalog. Pleas see below. In the code below you'll see the XML has three elements which I will be merging children on -

Retrieving tail text from html

阅读更多关于 Retrieving tail text from html

问题 Python 2.7 using lxml I have some annoyingly formed html that looks like this: <td> "John" "123 Main st. " "New York " "Sally" "101 California St. " "San Francisco " </td> So basically it's a single td with a ton of stuff in it. I'm trying to compile a list or dict of the names and their addresses. So far what I've done is gotten a list of nodes with names using tree.xpath('//td/b') . So let's assume I'm currently on the b node for John. I'm trying to get

python: get data from changing span class using lxml xpath

阅读更多关于 python: get data from changing span class using lxml xpath

问题 I want to extract 'Return On Assets' from wsj websites. However, my code is not robust enough to work in different conditions. I able to extract data for ticker 'SCGM' using the code below but fail for'AASIA' as from lxml import html import requests StockData =['SCGM','AASIA'] page_wsj1 = requests.get('http://quotes.wsj.com/MY/'+StockData[x]+'/financials') wsj1 = html.fromstring(page_wsj1.content) wsj_fig = wsj1.xpath('//span[@class="marketDelta

LXML unable to retrieve webpage with error “failed to load HTTP resource”

阅读更多关于 LXML unable to retrieve webpage with error “failed to load HTTP resource”

问题 Hi so I tried opening the link below in a browser and it works but not in the code. The link is actually a combination of a news site and then the extension of the article called from another file url.txt. I tried the code with a normal website (www.google.com) and it works perfectly. import sys import MySQLdb from mechanize import Browser from bs4 import BeautifulSoup, SoupStrainer from nltk import word_tokenize from nltk.tokenize import * import urllib2 import nltk, re, pprint import

Dynamic search through xml attributes using lxml and xpath in python

阅读更多关于 Dynamic search through xml attributes using lxml and xpath in python

问题 I am working to move nexted xml data into a hierarchical data frame. I was able to get all of the data out of the xml thanks to help on SO. However, now, I am working to clean up the data that I extract and shape it before output because I will be doing this thousands of times. UPDATED: THIS IS WHAT I EVENTUALLY WANT OUT. I cannot seem to fetch just the Time and value for channel dynamically. The channel names will change for each file. When channel = txt1[0] (for this file, txt1[0]="blah" )

lxml find element by name, but use variable in search

阅读更多关于 lxml find element by name, but use variable in search

问题 I have a problem with the find function in lXML. But i think this is more a generic question how to tell that i want to check against the value, not the object reference. So here is the code that works: step = xml_obj.find('.//step/name[text()="Design"]').getparent() If i try to replace the string with an object, the result is always None. stepn = 'Design' step = xml_obj.find('.//step/name[text()=stepn]').getparent() 'NoneType' object has no attribute 'getparent' 回答1: stepn = 'Design' step =

Write xml from list of path/values

阅读更多关于 Write xml from list of path/values

问题 This is a follow-up to the previous question: Write xml with a path and value. I want to now add in two additional things: 1) Attributes and 2) Multiple items with a parent node. Here is the list of paths I have: [ {'Path': 'Item/Info/Name', 'Value': 'Body HD'}, {'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'}, {'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'}, {'Path': 'Item/Genres/Genre', 'Value': 'Action'}, {'Path': 'Item/Genres/Genre',

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

阅读更多关于 bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

问题 This question already has answers here : beautifulsoup won't recognize lxml (2 answers) Closed 3 years ago . Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it? elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') or submission.url.endswith('webm') or submission.url.endswith('mp4') or 'all' in submission.url or '#' in submission.url or '/a/' in submission.url

“The SOAP request must use SOAP 1.1…”

阅读更多关于 “The SOAP request must use SOAP 1.1…”

问题 I am writing some code that generates XML and, using the requests library, POSTs the XML to Salesforce.com's SOAP service. Here is the code that I'm using to generate the XML: from lxml import etree class SalesforceLeadConverter(object): def __init__(self, session_id, lead_id, **kwargs): self.session_id = session_id self.lead_id = lead_id def build_xml(self): root = etree.Element( '{soapenv}Envelope', soapenv='<a rel="nofollow" class="external free" href="http://schemas.xmlsoap.org/soap

Python ElementTree XML Modifying Elements with Multiple Values

阅读更多关于 Python ElementTree XML Modifying Elements with Multiple Values

问题 Using Python 2.7 and lxml , how do I modify XML elements with multiple values? E.g. <Title> <Playcount>1</Playcount> <Genre>Adventure</Genre> <Genre>Comedy</Genre> <Genre>Action</Genre> </Title> It is straight forward to modify Playcount , as it has a single value. How do I modify Genre , witch has multiple values? e.g: How do I delete all but the first genre? How do I add a genre? How do I modify all Baseball genre to Sports? Thanks. 回答1: Like this:: from lxml import etree parser = etree