xpath

finding children in php simplexml xpath

爱⌒轻易说出口 提交于 2020-01-16 03:14:12
问题 I am running an xpath query on an xml stream and retreiving a set of data. In that i need to find the tag name. But im not able to find the way to retrieve the tag name. The xml stream is <Condition> <Normal dataItemId="Xovertemp-06" timestamp="2011-09-02T03:35:34.535703Z" name="Xovertemp" sequence="24544" type="TEMPERATURE"/> <Normal dataItemId="Xservo-06" timestamp="2011-09-02T03:35:34.535765Z" name="Xservo" sequence="24545" type="LOAD"/> <Normal dataItemId="Xtravel-06" timestamp="2011-09

Simple XPath predicate not working

放肆的年华 提交于 2020-01-16 02:33:18
问题 I'm running the following PHP snippet that evaluates an XPath query agains the source of a HTML page. The query seems correct and I tested it with some online XPath testers, but I won't get any match. <?php $details = new DOMDocument(); @$details->loadHTMLFile('http://www.astagiudiziaria.com/beni/lotto_unico_genova_via_della_pigna_6b_-_proc_n_583_14_trib_di_genova/index.html'); $xpath = new DOMXpath($details); $procedimento = $xpath->query('.//ul[preceding-sibling::h2="Informazioni sulla

记录抓取某直聘网站

一世执手 提交于 2020-01-16 00:43:51
近期有朋友让我帮抓一下某个直聘网站的招聘岗位,闲来无事就试了一下。 考虑到这种网站肯定是有反爬机制,于是使用Selenium+Chrome的方式抓取 用到的主要工具: python3.5 selenium scrapy 由于[网站的数据跟单( http://www.gendan5.com/tech.html)是可以按照地市来查询的,所以先访问该网站支持的城市划分 使用scrapy的self.start_urls进行请求 self.start_urls = [' https://www.zhipin.com/wapi/zpCommon/data/city.json ',] 1 同时使用selenium请求该网站主页 self.driver.get(' https://www.zhipin.com/ ') 1 后来发现网站可以识别selenium,不返回数据,于是添加 options = webdriver.ChromeOptions() options.add_experimental_option('excludeSwitches', ['enable-automation']) self.driver = webdriver.Chrome(options=options) 将程序设置为开发者模式,数据可以正常请求到 接下来就是解析支持搜索的城市名,并且汇总成我们能使用的数据格式

Perl - XML::LibXML - getting elements that have certain attributes

狂风中的少年 提交于 2020-01-16 00:34:09
问题 I have a problem I am hoping someone can help with... I have the following example xml structure: <library> <book> <title>Perl Best Practices</title> <author>Damian Conway</author> <isbn>0596001738</isbn> <pages>542</pages> <image src="http://www.oreilly.com/catalog/covers/perlbp.s.gif" width="145" height="190" /> </book> <book> <title>Perl Cookbook, Second Edition</title> <author>Tom Christiansen</author> <author>Nathan Torkington</author> <isbn>0596003137</isbn> <pages>964</pages> <image

how to extract a list of label value with scrapy when html tag are missing

…衆ロ難τιáo~ 提交于 2020-01-15 20:18:09
问题 I am currently processing a document with <b> label1 </b> value1 <br> <b> label2 </b> value2 <br> .... I can't figure out a clean approach to xpath with scrapy. here is my best implementation hxs = HtmlXPathSelector(response) section = hxs.select(..............) values = section.select("text()[preceding-sibling::b/text()]"): labels = section.select("text()/preceding-sibling::b/text()"): but I am not comfortable with this approach for matching nodes of both lists through index. I'd rather

how to extract a list of label value with scrapy when html tag are missing

会有一股神秘感。 提交于 2020-01-15 20:14:32
问题 I am currently processing a document with <b> label1 </b> value1 <br> <b> label2 </b> value2 <br> .... I can't figure out a clean approach to xpath with scrapy. here is my best implementation hxs = HtmlXPathSelector(response) section = hxs.select(..............) values = section.select("text()[preceding-sibling::b/text()]"): labels = section.select("text()/preceding-sibling::b/text()"): but I am not comfortable with this approach for matching nodes of both lists through index. I'd rather

how to extract a list of label value with scrapy when html tag are missing

南笙酒味 提交于 2020-01-15 20:13:53
问题 I am currently processing a document with <b> label1 </b> value1 <br> <b> label2 </b> value2 <br> .... I can't figure out a clean approach to xpath with scrapy. here is my best implementation hxs = HtmlXPathSelector(response) section = hxs.select(..............) values = section.select("text()[preceding-sibling::b/text()]"): labels = section.select("text()/preceding-sibling::b/text()"): but I am not comfortable with this approach for matching nodes of both lists through index. I'd rather

how to extract a list of label value with scrapy when html tag are missing

守給你的承諾、 提交于 2020-01-15 20:13:19
问题 I am currently processing a document with <b> label1 </b> value1 <br> <b> label2 </b> value2 <br> .... I can't figure out a clean approach to xpath with scrapy. here is my best implementation hxs = HtmlXPathSelector(response) section = hxs.select(..............) values = section.select("text()[preceding-sibling::b/text()]"): labels = section.select("text()/preceding-sibling::b/text()"): but I am not comfortable with this approach for matching nodes of both lists through index. I'd rather

Selecting Nodes in XSD schema using Xpath

倾然丶 夕夏残阳落幕 提交于 2020-01-15 18:52:07
问题 I have the following code that I wish to use to select all the elements I will need in a certain sequence. Here's the snippet: XmlDocument schema = new XmlDocument(); schema.Load(SchemaFileName); XmlNamespaceManager xnm = new XmlNamespaceManager(schema.NameTable); xnm.AddNamespace("xs", "http://www.w3.org/2001/XMLSchema"); XmlNodeList list = schema.SelectNodes(Path); However, I'm not sure what I should write as the path. Ideally I want to select all the child nodes of the "sequence" tag, but

Selecting Nodes in XSD schema using Xpath

自古美人都是妖i 提交于 2020-01-15 18:47:31
问题 I have the following code that I wish to use to select all the elements I will need in a certain sequence. Here's the snippet: XmlDocument schema = new XmlDocument(); schema.Load(SchemaFileName); XmlNamespaceManager xnm = new XmlNamespaceManager(schema.NameTable); xnm.AddNamespace("xs", "http://www.w3.org/2001/XMLSchema"); XmlNodeList list = schema.SelectNodes(Path); However, I'm not sure what I should write as the path. Ideally I want to select all the child nodes of the "sequence" tag, but