lxml

Merge two XML files by matching elements by attribute value

你离开我真会死。 提交于 2019-12-12 04:36:44
问题 I have two XML files that I'm trying to merge. I looked at other previous questions, but I don't feel like I can solve my problem from reading those. What I think makes my situation unique is that I have to find elements by attribute value and then merge to the opposite file. I have two files. One is an English translation catalog and the second is a Japanese translation catalog. Pleas see below. In the code below you'll see the XML has three elements which I will be merging children on -

Retrieving tail text from html

∥☆過路亽.° 提交于 2019-12-12 04:29:41
问题 Python 2.7 using lxml I have some annoyingly formed html that looks like this: <td> <b>"John" </b> <br> "123 Main st. " <br> "New York " <b> "Sally" </b> <br> "101 California St. " <br> "San Francisco " </td> So basically it's a single td with a ton of stuff in it. I'm trying to compile a list or dict of the names and their addresses. So far what I've done is gotten a list of nodes with names using tree.xpath('//td/b') . So let's assume I'm currently on the b node for John. I'm trying to get

python: get data from changing span class using lxml xpath

99封情书 提交于 2019-12-12 04:12:48
问题 I want to extract 'Return On Assets' from wsj websites. However, my code is not robust enough to work in different conditions. I able to extract data for ticker 'SCGM' using the code below but fail for'AASIA' as <span class="marketDelta deltaType-negative"> from lxml import html import requests StockData =['SCGM','AASIA'] page_wsj1 = requests.get('http://quotes.wsj.com/MY/'+StockData[x]+'/financials') wsj1 = html.fromstring(page_wsj1.content) wsj_fig = wsj1.xpath('//span[@class="marketDelta

LXML unable to retrieve webpage with error “failed to load HTTP resource”

北战南征 提交于 2019-12-12 04:12:46
问题 Hi so I tried opening the link below in a browser and it works but not in the code. The link is actually a combination of a news site and then the extension of the article called from another file url.txt. I tried the code with a normal website (www.google.com) and it works perfectly. import sys import MySQLdb from mechanize import Browser from bs4 import BeautifulSoup, SoupStrainer from nltk import word_tokenize from nltk.tokenize import * import urllib2 import nltk, re, pprint import

Dynamic search through xml attributes using lxml and xpath in python

試著忘記壹切 提交于 2019-12-12 04:03:20
问题 I am working to move nexted xml data into a hierarchical data frame. I was able to get all of the data out of the xml thanks to help on SO. However, now, I am working to clean up the data that I extract and shape it before output because I will be doing this thousands of times. UPDATED: THIS IS WHAT I EVENTUALLY WANT OUT. I cannot seem to fetch just the Time and value for channel dynamically. The channel names will change for each file. When channel = txt1[0] (for this file, txt1[0]="blah" )

lxml find element by name, but use variable in search

霸气de小男生 提交于 2019-12-12 03:56:13
问题 I have a problem with the find function in lXML. But i think this is more a generic question how to tell that i want to check against the value, not the object reference. So here is the code that works: step = xml_obj.find('.//step/name[text()="Design"]').getparent() If i try to replace the string with an object, the result is always None. stepn = 'Design' step = xml_obj.find('.//step/name[text()=stepn]').getparent() 'NoneType' object has no attribute 'getparent' 回答1: stepn = 'Design' step =

Write xml from list of path/values

試著忘記壹切 提交于 2019-12-12 03:51:41
问题 This is a follow-up to the previous question: Write xml with a path and value. I want to now add in two additional things: 1) Attributes and 2) Multiple items with a parent node. Here is the list of paths I have: [ {'Path': 'Item/Info/Name', 'Value': 'Body HD'}, {'Path': 'Item/Info/Synopsis', 'Value': 'A great movie'}, {'Path': 'Item/Locales/Locale[@Country="US"][@Language="ES"]/Name', 'Value': 'El Grecco'}, {'Path': 'Item/Genres/Genre', 'Value': 'Action'}, {'Path': 'Item/Genres/Genre',

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

☆樱花仙子☆ 提交于 2019-12-12 03:42:32
问题 This question already has answers here : beautifulsoup won't recognize lxml (2 answers) Closed 3 years ago . Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it? elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') or submission.url.endswith('webm') or submission.url.endswith('mp4') or 'all' in submission.url or '#' in submission.url or '/a/' in submission.url

“The SOAP request must use SOAP 1.1…”

℡╲_俬逩灬. 提交于 2019-12-12 03:16:53
问题 I am writing some code that generates XML and, using the requests library, POSTs the XML to Salesforce.com's SOAP service. Here is the code that I'm using to generate the XML: from lxml import etree class SalesforceLeadConverter(object): def __init__(self, session_id, lead_id, **kwargs): self.session_id = session_id self.lead_id = lead_id def build_xml(self): root = etree.Element( '{soapenv}Envelope', soapenv='<a rel="nofollow" class="external free" href="http://schemas.xmlsoap.org/soap

Python ElementTree XML Modifying Elements with Multiple Values

半世苍凉 提交于 2019-12-12 02:56:20
问题 Using Python 2.7 and lxml , how do I modify XML elements with multiple values? E.g. <Title> <Playcount>1</Playcount> <Genre>Adventure</Genre> <Genre>Comedy</Genre> <Genre>Action</Genre> </Title> It is straight forward to modify Playcount , as it has a single value. How do I modify Genre , witch has multiple values? e.g: How do I delete all but the first genre? How do I add a genre? How do I modify all Baseball genre to Sports? Thanks. 回答1: Like this:: from lxml import etree parser = etree