xpath

Parse XML file to get all Namespace information

瘦欲@ 提交于 2020-02-01 05:28:26
问题 I want to be able to get all namespace information from a given XML File. So for example, if the input XML File is something like: <ns1:create xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"> <ns1:article xmlns:ns1="xmlns:ns1='http://predic8.com/material/1/"> <ns1:id>1</ns1:id> <description>bar</description> <name>foo</name> <ns1:price> <amount>00.00</amount> <currency>USD</currency> </ns1:price> <ns1:price> <amount>11.11</amount> <currency>AUD</currency> </ns1:price> </ns1

Scrapy项目 - 项目源码 - 实现腾讯网站社会招聘信息爬取的爬虫设计

倾然丶 夕夏残阳落幕 提交于 2020-01-31 23:39:35
1.tencentSpider.py # -*- coding: utf-8 -*- import scrapy from Tencent.items import TencentItem #创建爬虫类 class TencentspiderSpider(scrapy.Spider): name = 'tencentSpider'#爬虫名字 allowed_domains = ['tencent.com']#容许爬虫的作用范围 # 定义开始的URL offset = 0 url = 'https://hr.tencent.com/position.php?&start=' #urll='#a' start_urls = [url + str(offset)] # 爬虫开始的URL def parse(self, response): # 继承 item = TencentItem() # 根节点 movies = response.xpath("//tr[@class='odd']|//tr[@class='even']") for each in movies: item['zhiwei']=each.xpath(".//td[@class='l square']/a/text()").extract()[0] item['lianjie'] = each.xpath("./

爬虫实例:唐诗宋词爬虫

*爱你&永不变心* 提交于 2020-01-31 22:22:26
每年都期待夏天赶紧变成秋天,没有木头马尾的九月,没有颜色奇迹的南方,只得古诗词里把情绪商量,算云烟,此处认春秋。 基本分析 1.根据古诗词网页结构,可以看出诗词正文有两种结构,一种是p标签分隔的,一种是br标签分隔的 from lxml import etree s = """ <div id="contson47919" class="contson"> <p>aaa<br>a</p> <p>bb</p> <p>c</p> </div> """ selector = etree.HTML(s) #s = selector.xpath('//*[@class="contson"]/p') #aaa #s = selector.xpath('string(//*[@class="contson"]/p)') #aaaa s = selector.xpath('string(//*[@class="contson"])') # aaaa \n bb \n c #print list(s) #['\n', 'a', 'a', 'a', 'a', '\n', 'b', 'b', '\n', 'c', '\n'] s = [i for i in s if i != '\n'] #['a', 'a', 'a', 'a', 'b', 'b', 'c'] s = ''.join(s) print

Filling out “First Name” field of a signup page

为君一笑 提交于 2020-01-30 13:18:48
问题 Problem i'm currently having is trying to fill out the First Name field out of a sign up page. I've managed to fill out the email name, and select the gender using selenium. When I try to locate the First Name element using it's Xpath, I get an selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="bb1bda44-91c9-4668-8641-4f3bbbd0c6cd"]"} error Code: import selenium from selenium.webdriver.common.by import

Filling out “First Name” field of a signup page

寵の児 提交于 2020-01-30 13:18:05
问题 Problem i'm currently having is trying to fill out the First Name field out of a sign up page. I've managed to fill out the email name, and select the gender using selenium. When I try to locate the First Name element using it's Xpath, I get an selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="bb1bda44-91c9-4668-8641-4f3bbbd0c6cd"]"} error Code: import selenium from selenium.webdriver.common.by import

Python Selenium - How to loop to the last <li> element in a site

左心房为你撑大大i 提交于 2020-01-30 13:04:34
问题 I have created a python selenium script that should navigate through a website and collect people profiles (https://www.shearman.com/people). The program won't loop through the pages to collect the links. I have used this which doesn't work; try: # this is navigate to next page driver.find_element_by_xpath('//div[@id="searchResultsSection"]/ul/li[12]').click() time.sleep(1) except NoSuchElementException: break The syntax behind the next button can be seen below; <a href="" onclick=

Python Selenium - How to loop to the last <li> element in a site

◇◆丶佛笑我妖孽 提交于 2020-01-30 13:03:06
问题 I have created a python selenium script that should navigate through a website and collect people profiles (https://www.shearman.com/people). The program won't loop through the pages to collect the links. I have used this which doesn't work; try: # this is navigate to next page driver.find_element_by_xpath('//div[@id="searchResultsSection"]/ul/li[12]').click() time.sleep(1) except NoSuchElementException: break The syntax behind the next button can be seen below; <a href="" onclick=

How to select an option from the kendo dropdown using selenium webdriver and Java

痞子三分冷 提交于 2020-01-30 12:52:05
问题 Here is my HTML code: <ul unselectable="on" class="k-list k-reset" tabindex="-1" aria-hidden="true" id="ddlSettleMode_listbox" aria-live="polite" data-role="staticlist" role="listbox"><li tabindex="-1" role="option" unselectable="on" class="k-item k-state-selected k-state-focused" data-offset-index="0" id="18e2d509-b1e1-4588-bd2a-dcff29b45b83">Select</li><li tabindex="-1" role="option" unselectable="on" class="k-item" data-offset-index="1">Cash</li><li tabindex="-1" role="option" unselectable

Decoding Class names on facebook through Selenium

百般思念 提交于 2020-01-30 12:30:32
问题 I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session. So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and