xpath

xpath find specific link in page

一个人想着一个人 提交于 2020-02-05 13:04:27
问题 I'm trying to get the email to a friend link from this page using xpath. http://www.guardian.co.uk/education/2009/oct/14/30000-miss-university-place The link itself is wrapped up in tags like this <li><a class="rollover sendlink" href="http://www.guardian.co.uk/email/354237257" title="Opens an email form" name="&lid={pageToolbox}{Email a friend}&lpos={pageToolbox}{2}"><img src="http://static.guim.co.uk/static/80163/common/images/icon_email-friend.gif" alt="" class="trail-icon" /><span>Send to

Python xpath query not returning text value

此生再无相见时 提交于 2020-02-05 05:53:29
问题 I am trying to scrape data from the following page using the lxml module in Python: http://www.thehindu.com/todays-paper/with-afspa-india-has-failed-statute-amnesty/article7376286.ece. I want to get the text in the first paragraph, but the following code is returning null value from lxml import html import requests page = requests.get('http://www.thehindu.com/todays-paper/with-afspa-india-has-failed-statute-amnesty/article7376286.ece') tree = html.fromstring(page.text) data = tree.xpath('//*[

2020学习 04 python 爬取某政府网站信件

纵饮孤独 提交于 2020-02-05 01:09:24
直接上代码⑧: #coding:utf-8 import requests from lxml import etree import time import pymysql import datetime import urllib import json from IPython.core.page import page conn = pymysql.connect( host="localhost", user="root", port=3306, password="123456", database="bjxj") gg=2950 def db(conn, reqcontent,reqname,reqtime,resname,restime,rescontent,reqtype,isreply): cursor = conn.cursor() # cursor.execute( # "INSERT INTO xinjian(name) VALUES (%s)", # [name]) if isreply == False : isreply = 0 restime1 = '' else : isreply = 1 restime1 = restime[0] print(reqcontent) print(reqname) print(reqtime) print

Selenium库使用

只谈情不闲聊 提交于 2020-02-05 00:39:58
文章目录 1. 模拟浏览器ChromeDriver的下载与安装 2. Selenium库的安装与使用 2.1 Selenium库的安装 2.2 Selenium库的使用 Selenium库是一个自动化测试工具,能够驱动浏览器模拟人的操作,如鼠标单击、键盘输入等。通过Selenium库能够比较容易地获取网页的源代码,还可以进行网络内容的批量自动下载,比较重要的一点是,可以快速获取动态渲染的网页源代码。 1. 模拟浏览器ChromeDriver的下载与安装 在学习Selenium库之前,需要先下载并安装ChromeDriver。它的作用是给Python提供一个模拟浏览器,让Python通过这个模拟浏览器访问网页,并通过Selenium库进行鼠标和键盘等操作,获取网页源代码。 安装谷歌浏览器,并查看版本号。 这里获取谷歌浏览器的版本号是有必要的,因为ChromeDriver的版本号需要与Chrome浏览器的版本号相对应。浏览器版本号查看方法: 帮助->关于Google Chrome. ChromeDriver下载与安装 我们可以到镜像网站下载得到ChromeDriver, http://npm.taobao.org/mirrors/chromedriver/ 。 将下载得到的ChromeDriver压缩包解压,得到一个exe可执行文件,将其复制到Python安装路径中的Scripts中。

慕测的自主可控测试

余生长醉 提交于 2020-02-05 00:22:40
最近准备自主可控测试比赛,要求用360浏览器。 1 import java.util.ArrayList; 2 import java.util.List; 3 import java.util.Set; 4 5 import org.openqa.selenium.chrome.ChromeDriver; 6 import org.openqa.selenium.chrome.ChromeOptions; 7 import org.openqa.selenium.firefox.FirefoxBinary; 8 import org.openqa.selenium.firefox.FirefoxDriver; 9 import org.openqa.selenium.WebDriver; 10 import org.openqa.selenium.WebElement; 11 import org.openqa.selenium.By; 12 public class Example { 13 14 // Mooctest Selenium Example 15 16 17 // <!> Check if selenium-standalone.jar is added to build path. 18 19 public static void test(WebDriver

How to use for loop in XSLT and get node values based on the iteration

倖福魔咒の 提交于 2020-02-04 05:57:41
问题 How do we use for loop in XSLT? I have this requirement where I want to convert the below shown xml into a comma separated file. Number of rows in CSV file would be equal to count of COBRA_Records_within_Range nodes for an employee's report entry. All values in 3 rows will be same except the child element values for COBRA_Records_within_Range nodes. I am able to create 3 rows but not able to retrieve the value for child elements of COBRA_Records_within_Range . I want to run the for loop on a

Linkedin Webscrape w Selenium

坚强是说给别人听的谎言 提交于 2020-02-04 04:03:31
问题 I am new to web development and scraping in general and I am trying to challenge myself by scrape websites like LinkedIn. Since they have embers and dynamically changing ids it is a bit more struggle to scrape properly. I am trying to scrape the "experience section" of a LinkedIn profile by looking using the following code: experience = driver.find_element_by_xpath('//section[@id = "experience-section"]/ul/li[@class="position"]') the driver got the entire Linkedin profile webpage. I would

How to deal with single and double quotes in xpath in Python

谁都会走 提交于 2020-02-04 03:47:05
问题 I have an XPath which has a single quote in XPath which is causing a SyntaxError: error . I've tried with escape sequence: xpath = "//label[contains(text(),'Ayuntamiento de la Vall d'Uixó - Festivales Musix')]" But I am still facing an error: SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//label[contains(text(),'Ayuntamiento de la Vall d'Uixó - Festivales Musix')]' is not a valid XPath expression. 回答1: There is no quote escaping in XPath string literals. (Note: This

网络爬虫之数据解析

别等时光非礼了梦想. 提交于 2020-02-04 01:42:18
网络爬虫之数据解析 XPath与lxml库 XPath基本语法 使用方式 注意事项 BeautifulSoup4库 正则表达式和re模块 解析工具对比 XPath与lxml库 XPath基本语法 1、选取结点 2、谓语 3、通配符 使用方式 XPath使用方式: 使用 // 获取整个页面当中的元素,然后写标签名,然后再写谓语进行提取 # 使用lxml库解析HTML代码: # 1、解析HTML字符串 html = etree . HTML ( text ) # 2、解析HTML文件 # 指定解析器,默认为XML解析器 parser = etree . HTMLParser ( encoding = 'utf-8' ) html = etree . parse ( "index.html" , parser = parser ) # 1、获取所有tr标签 trs = html . xpath ( "//tr" ) # 2、获取第二个tr标签 trs = html . xpath ( "//tr[2]" ) # 3、获取所有class等于even的tr标签 trs = html . xpath ( "//tr[@class='even']" ) # 4、获取所有a标签的href属性 a = html . xpath ( "//a/@href" ) 注意事项 BeautifulSoup4库

Get content of list of span elements with HTMLUnit and XPath

自古美人都是妖i 提交于 2020-02-03 09:36:46
问题 I want to get a list of values from an HTML document. I am using HTMLUnit. There are many span elements with the class topic. I want to extract the content within the span tags: <span class="topic"> <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a> </span> My code looks like this: List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()"); However whenever I try to iterate over the list I get a NoSuchElementException .