xpath

Extract html source code with selenium xpath

为君一笑 提交于 2020-05-09 17:14:27
问题 I want to extract data from a table that is very long. For every row in the table I want to copy a specific HTML code. My HTML source code (which I fully want to extract) looks like: <div class="relative"> <div id="stats9526577" class="dark"></div> <img src="/detaljer-1.gif" onmouseover="view_stats(9526577, 14, 13, 4, 7, 10, 8, 6, 3);"> </div> I tried the python code: data = driver.find_elements_by_xpath('//div[@class="relative"]') How can I print the above HTML source code in python using

Spiderman Java开源垂直爬虫抓取示例【需求小复杂】

强颜欢笑 提交于 2020-05-08 04:22:08
首先要说明的是,本文仅介绍了Spiderman解析 XML 的示例,Spiderman解析 HTML 也是差不多的原理,不过更考验“爬虫”的能力。 这个以后再发篇文章详细说明 【 已经有了请点击这里 】:) 在Github的spiderman-sample项目里面有好几个案例,可以跑跑看。 这是Spiderman链接: http://www.oschina.net/p/spiderman 1.Spiderman是一个垂直领域的爬虫,可用于抓取特定目标网页的内容,并且解析为所需要的业务数据,整个过程追求无需任何编码就能实现,这样带来的好处是部署简单,并且网页内容变化可以灵活应对。 2.本文演示所抓取的目标URL是: http://www.alldealsasia.com/feeds/xml 这是一个XML文件,提供了该网站所有活动的Deal 3.怎么用Git+Maven搭建Spiderman使用这里就不详细说明了 4.直接看效果 这是目标网页【一个xml页面】 为了完成以上的目标,需要配置一个xml文件让Spiderman根据目标执行 最后来看看抓取之后的结果数据,我是在回调方法里面写入文件的: // 初始化蜘蛛 Spiderman.init(new SpiderListener() { public void onNewUrls(Thread thread, Task task,

CSS selector and XPath in Selenium Python [closed]

谁说我不能喝 提交于 2020-05-01 06:41:12
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . How about using CSS selector in Selenium Python if I am not getting id or name or class of that HTML element ? How about preferring CSS in comparison to XPath? 回答1: No idea what you are trying to ask here. I can only take a guess. How about using css selector in Selenium Python if I am not getting

CSS selector and XPath in Selenium Python [closed]

大憨熊 提交于 2020-05-01 06:39:28
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 6 years ago . How about using CSS selector in Selenium Python if I am not getting id or name or class of that HTML element ? How about preferring CSS in comparison to XPath? 回答1: No idea what you are trying to ask here. I can only take a guess. How about using css selector in Selenium Python if I am not getting

Get all values of specific key with xpath (python web scraping)

眉间皱痕 提交于 2020-04-30 06:27:26
问题 Suppose we have web page <div class="specific-row" data-id="101736782"></div> <div class="yellow-box-row" data-id="112376244"></div> <div class="specific-row" data-id="179218312"></div> <div class="vip-row" data-id="123749014"></div> How can I get all data-id values? Like ['101736782', '112376244', '179218312', '123749014'] I used tree.xpath import requests from lxml import html r = requests.get(url) tree = html.fromstring(r.content) tree.xpath("//div@data-id=['any']") 回答1: I try this... from

How do i find all nodes without children (starting from non-root node!) in xpath/R?

女生的网名这么多〃 提交于 2020-04-30 06:24:25
问题 I know how to find all nodes that dont have a child node: library(rvest) library(magrittr) doc <- "https://www.r-bloggers.com/" %>% GET %>% content leafes <- doc %>% html_nodes(xpath = "//*[not(descendant::*)]") length(leafes) Now i try the same from nodes that are not the root node: doc <- "https://www.r-bloggers.com/" %>% GET %>% content tags <- doc %>% html_nodes(xpath = "/html/body/div/div/div/div/h2/a") nonRootNodeWithChildr <- tags %>% html_nodes(xpath = "..") %>% html_nodes(xpath = "..

How to get Absolute XPath in Chrome or Firefox?

偶尔善良 提交于 2020-04-30 05:07:30
问题 As written in the question I need the absolute Xpath and not the relative one. I mean, I need something like this: html/body/div[1]/section/div[1]/div/div/div/div[1]/div/div/div/div/div[3]/div[1]/div/h4[1]/b and not like this: //*[@class='featured-box']//*[text()='Testing'] On both browsers when I inspect the code and I use right click -> copy XPath I am getting the relative Path. Please note, I am using Firefox Quantum and I cannot use firebug or firepath because they are not supported. 回答1:

实验:关于XPath中的13个轴

混江龙づ霸主 提交于 2020-04-26 06:35:31
XSLT使用XPath来找寻XML文档中的信息,这几天在学习XSLT的找寻路径过程中,我写了下面这个例子,来加深我对XPath中各个轴的概念的理解 测试用的XML文档和XSLT文档 XML文档:LogReport.xml <?xml version="1.0" encoding="gb2312"?> <?xml-stylesheet type='text/xsl' href='LogReport.xslt'?> <LogReport CreateTime="2015/2/7 20:34:17"> <Data>DataA</Data> <Data>DataB</Data> <Data>DataC</Data> <LogList ListName="XXX"> <Log LogLevel="0" LogItem="Zhang" Description="Log1"> <LogMessage Message="abcdefg" /> </Log> <Log LogLevel="0" LogItem="Wang" Description="Log2"> <LogMessage Message="hijklmn" /> </Log> <Log LogLevel="1" LogItem="Lee" Description="Log3"> <LogMessage Message="opqrst"

how to replace \n to <br > in telegram instant view

巧了我就是萌 提交于 2020-04-19 19:02:06
问题 I am trying to setup a Telegram Instant View for a website. i have a text with a lot of break-lines \n and no <br> so i need a solution to replace every \n to <br> 回答1: Try @replace function: @replace("\\n", "<br>"): $body//p 回答2: There is no way (in the Instant View DSL) to replace a part of a text node with an HTML tag (i.e. element node). Any HTML you insert as text will be escaped. 回答3: As I remember, if you debug $paragraph/text() , there will be a lot of text nodes, that are separated

how to replace \n to <br > in telegram instant view

你离开我真会死。 提交于 2020-04-19 18:53:22
问题 I am trying to setup a Telegram Instant View for a website. i have a text with a lot of break-lines \n and no <br> so i need a solution to replace every \n to <br> 回答1: Try @replace function: @replace("\\n", "<br>"): $body//p 回答2: There is no way (in the Instant View DSL) to replace a part of a text node with an HTML tag (i.e. element node). Any HTML you insert as text will be escaped. 回答3: As I remember, if you debug $paragraph/text() , there will be a lot of text nodes, that are separated