xpath | 易学教程

Can't locate element on Microsoft Sign In page

阅读更多关于 Can't locate element on Microsoft Sign In page

问题 I'm currently trying to locate Microsoft Sign In page's email input box by using xpath (others as well) but after many tries I still can't locate the correct element for it. After copying the element from the page, this is the element given: <input type="email" class="form-control" placeholder="Email, phone, or Skype" aria-required="true" spellcheck="false" autocomplete="off" data-bind=" hasFocus: focus, textInput: email, attr: {'placeholder': config.text.emailPlaceHolder, 'aria-invalid':

Python爬虫使用Selenium爬取腾讯招聘信息

阅读更多关于 Python爬虫使用Selenium爬取腾讯招聘信息

使用Selenium爬取腾讯招聘信息，并保存excel 代码比较简单，直接上源码 from selenium import webdriver from selenium . webdriver . support . wait import WebDriverWait from selenium . webdriver . support import expected_conditions as EC from selenium . webdriver . common . by import By from lxml import etree import xlwt class Tencent ( object ) : def __init__ ( self , url ) : self . url = url self . driver = webdriver . Chrome ( ) self . data_list = [ ] self . main ( ) # 返回页面内容 def get_content_by_selenium ( self , url ) : self . driver . get ( url ) # 显示等待直到div[@class="correlation-degree"]'加载出来 wait = WebDriverWait ( self .

Python Scrapy框架

阅读更多关于 Python Scrapy框架

全局函数创建新的爬虫文件: Scrapy startproject 文件名 Scrapy version: 查看scrapy版本 Scrapy version -v :可以显示Scrapy依赖库的版本 Scrapy view 查看网址源代码 Scrapy shell 网址 : 测试网址再输入 response.text 拿到网页的源代码 Scrapy fetch 下载网页源代码 Scrapy bench 运行快速基准测试。 scrapy list 列出spider路径下的spider文件 scrapy edit 文件名字编辑spider文件局部函数: 新建一个爬虫 Scrapy genspider 爬虫名字域名运行爬虫 Scrapy crawl spider名字 Setting.py加入LOG_LEVEL = ‘WARNING’ 去除日志,只显示抓取结果 LOG_FILE = './log.log’ 将数据保存到log.log中 Item={} Item[‘name’]=Response.xpath().extract() extract():这个方法返回的是一个数组list，，里面包含了多个string，如果只有一个string，则返回[‘ABC’]这样的形式。 Item[‘name’]=Response.xpath().extract_first() extract

[B10]爬虫课程02

阅读更多关于 [B10]爬虫课程02

数据解析 1.Xpath语法和lxml模块 #使用方式：使用//获取整个页面当中的元素，然后写标签名，然后再写谓词进行提取。 //div[@clas='abc'] 需要注意的知识点： 1./和//的区别：/只获取直接子节点，//可以获取子孙节点 2.contains:有时候某个属性包含多个值，可以使用cntains //div[contains(@class,'job_detail')] 3.谓词的下标是从1开始使用lxml解析HTML代码： 1.解析html字符串：使用’lxml.etree.HTML’进行 htmlElement = etree . HTML ( text ) print ( etree . tostring ( htmlElement , encoding = 'utf-8' ) . decode ( 'utf-8' ) ) 2.解析html文件：使用’lxml.etree.parse’进行，如果这个函数默认使用xml解析器，需要自己创建html解析器。 htmlElement = etree . parse ( 'qingyunian.html' ) print ( etree . tostring ( htmlElement , encoding = 'utf-8' ) . decode ( 'utf-8' ) ) 实例 from lxml import

爬取北京市政百姓信件内容

阅读更多关于爬取北京市政百姓信件内容

问题：换页url不边，Ajax加载，于是进行抓包：可是发现换页的时候Request URL也不变（看很多类似教程都是找url变化规律）这时候我选择使用selenium和Chrome配合，模拟浏览器输入页数获得网页：源代码： from lxml import etreeimport requestsimport csvfrom selenium import webdriverimport timeimport osfrom selenium.webdriver.chrome.webdriver import WebDriver#创建csvoutPath = 'D://xinfang_data.csv'if (os.path.exists(outPath)): os.remove(outPath)fp = open(outPath, 'wt', newline='', encoding='utf-8') # 创建csvwriter = csv.writer(fp)writer.writerow(('kind', 'time', 'processingDepartment', 'content'))#请求头headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

Finding parent from child in XML using python

阅读更多关于 Finding parent from child in XML using python

问题 I'm new to this, so please be patient. Using ETree and Python 2.7, I'm trying to parse a large XML file that I did not generate. Basically, the file contains groups of voxels contained in a large volume. The general format is: <things> <parameters> <various parameters> </parameters> <thing id="1" comment="thing1"> <nodes> <node id="1" x="1" y="1" z="1"/> <node id="2" x="2" y="2" z="2"/> </nodes> <edges> <edge source="1" target="2"/> </edges> </thing> <thing id="N" comment="thingN"> <nodes>

Finding parent from child in XML using python

阅读更多关于 Finding parent from child in XML using python

Get grandParent using xpath in selenium webdriver

阅读更多关于 Get grandParent using xpath in selenium webdriver

问题 <div class="myclass"> <span>...</span> <div> <table> <colgroup>...</> <tbody> <tr>...</tr> <tr> <td> <input ...name="myname" ...> </td> </tr> <tr>...</tr> <tbody> </table> </div> </div> this is my html code ... I have attribute "name" ..so I can access tag ...but from this line I want to access top element. One way is that I write driver.find_element_by_xpath('..') 6-7 times to get that parent but I dont know how many steps I have to go above. I simply want a xpath expression or similar thing

Jmeter-正则表达和xpath

阅读更多关于 Jmeter-正则表达和xpath

一、正则表达式提取器 1、添加正则表达式在需要获得数据的上一个请求上右击添加一个后置处理器-->正则表达式提取器解释：（1）引用名称：下一个请求要引用的参数名称，如填写activityID，则可用${activityID}引用它。（2）正则表达式：　　　　()括起来的部分就是要提取的。　　　　.匹配任何字符串。　　　　+：一次或多次。　　　　?：不要太贪婪，在找到第一个匹配项后停止。注：(.+?)[.\n]+可以匹配换行符在内的所有字符。（3）模板：用$$引用起来，如果在正则表达式中有多个正则表达式（多个括号括起来的东东），则可以是$2$$3$等等，表示解析到的第几个值给title。如：$1$表示解析到的第1个值（4）匹配数字：0代表随机取值，1代表全部取值，通常情况下填0，如果在LR中，取出的值是一个数组，还得处理一下，LR11版本用一个随机的函数就可以不用写大段的代码来处理数组。（5）缺省值：如果参数没有取得到值，那默认给一个值让它取。 2、关于正则表达式的举例说明（1）提取单个字符串：假设测试人员期望匹配Web页面的如下部分：<input type="hidden" name="passport" id="passport" value="1234567897"/>并提取1234567897。一个符合要求的正则表达式：<input

Jsonpath的基本使用

阅读更多关于 Jsonpath的基本使用

JSONPath - 是xpath在json的应用。 xml最大的优点就有大量的工具可以分析，转换，和选择性的提取文档中的数据。XPath是这些最强大的工具之一。如果可以使用xpath来解析json，以下的问题可以被解决：　　1，数据不使用特殊的脚本，可以在客户端交互的发现并取并获取。　　2，客户机请求的JSON数据可以减少到服务器上的相关部分，这样可以最大限度地减少服务器响应的带宽使用率。如果我们愿意，这个可以解析json数据的工具会变得有意义。随之而来的问题是它如何工作，jsonpath的表达式看起来怎么样。事实上，json是由c系统编程语言表示自然数据，有特定语言的特定语法来访问json数据。 xpath的表达式：　　 /store/book[1]/title 我们可以看作是：　　 x.store.book[0].title 或　　 x['store']['book'][0]['title'] 在Javascript, Python 和 PHP 中一个变量x表示json数据。经过观察，特定的语言里有内置xpath来解析数据。 JSONPath工具的问题　　-依赖某种特定的语言　　- 需要依赖XPath 1.0 　　- 减少代码量和内存的消耗　　- 在运行时 JSONPath 表达式 JSONPath 是参照，xpath表达式来解析xml文档的方式

订阅 xpath