domxpath | 易学教程

PHP xpath query on XML with default namespace binding

阅读更多关于 PHP xpath query on XML with default namespace binding

问题 I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this. Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is: ./xpeg "//MainType[@ID=123]" What seems most strange is this line, without which my approach doesn’t work: $result->loadXML($result->saveXML($result)); As far as I know, this simply re-parses the modified XML, and it seems to me that this

PHP xpath contains class and does not contain class

阅读更多关于 PHP xpath contains class and does not contain class

The title sums it up. I'm trying to query an HTML file for all div tags that contain the class result and does not contain the class grid . <div class="result grid">skip this div</div> <div class="result">grab this one</div> Thanks! This should do it: <?php $doc = new DOMDocument(); $doc->loadHTMLFile('test.html'); $xpath = new DOMXPath($doc); $nodeList = $xpath->query( "//div[contains(@class, 'result') and not(contains(@class, 'grid'))]"); foreach ($nodeList as $node) { echo $node->nodeName . "\n"; } Your XPath would be //div[contains(concat(' ', @class, ' '), ' result ') and not(contains

DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

阅读更多关于 DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

I've building a command line php scraping app that uses XPath to analyze the HTML - the problem is every time a new DOMXPath class instance gets loaded in a loop I'm getting a memory loss roughly equal to the size of the XML being loaded. The script runs and runs, slowly building up memory usage until it hits the limit and quits. I've tried forcing garbage collection with gc_collect_cycles() and PHP still isn't getting back memory from old Xpath requests. Indeed the definition of the DOMXPath class doesn't seem to even include a destructor function? So my question is ... is there any way to

What is the difference between DOMXPath::evaluate and DOMXPath::query?

阅读更多关于 What is the difference between DOMXPath::evaluate and DOMXPath::query?

Trying to decide which is more appropriate for my use case... After comparing the documentation for these methods, my vague understanding is evaluate returns a typed result but query doesn't. Furthermore, the query example includes looping through many results but the evaluate example assumes a single typed result. Still not much the wiser! Could anyone explain (in as close as possible to layman's terms) when you would use one or the other - e.g. will the multiple/single results mentioned above always be the case? ThW DOMXPath::query() supports only expressions that return a node list.

DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

阅读更多关于 DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

问题 I've building a command line php scraping app that uses XPath to analyze the HTML - the problem is every time a new DOMXPath class instance gets loaded in a loop I'm getting a memory loss roughly equal to the size of the XML being loaded. The script runs and runs, slowly building up memory usage until it hits the limit and quits. I've tried forcing garbage collection with gc_collect_cycles() and PHP still isn't getting back memory from old Xpath requests. Indeed the definition of the DOMXPath

What is the difference between DOMXPath::evaluate and DOMXPath::query?

阅读更多关于 What is the difference between DOMXPath::evaluate and DOMXPath::query?

问题 Trying to decide which is more appropriate for my use case... After comparing the documentation for these methods, my vague understanding is evaluate returns a typed result but query doesn't. Furthermore, the query example includes looping through many results but the evaluate example assumes a single typed result. Still not much the wiser! Could anyone explain (in as close as possible to layman's terms) when you would use one or the other - e.g. will the multiple/single results mentioned

unable to scrape content from a website

阅读更多关于 unable to scrape content from a website

I am trying to scrap some content from a website but the code below is not working(not showing any output). here is the code $url="some url"; $otherHeaders=""; //here i am using some other headers like content-type,userAgent,etc some curl to get the webpage ... .. curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); $content=curl_exec($ch);curl_close($ch); $page=new DOMDocument(); $xpath=new DOMXPath($page); $content=getXHTML($content); //this is a tidy function to convert bad html to xhtml $page->loadHTML($content); // its okay till here when i echo $page->saveHTML the page is displayed $path1="//body

DOMXpath - Get href attribute and text value of an a element

阅读更多关于 DOMXpath - Get href attribute and text value of an a element

So I have a HTML string like this: <td class="name"> <a href="/blah/somename23123">Some Name</a> </td> <td class="name"> <a href="/blah/somename28787">Some Name2</a> </td> Using XPath I'm able to get value of href attribute using this Xpath query: $domXpath = new \DOMXPath($this->domPage); $hrefs = $domXpath->query("//td[@class='name']/a/@href"); foreach($hrefs as $href) {...} And It's even easier to get a text value, like this: // Xpath auto. strips any html tags so we are // left with clean text value of a element $domXpath = new \DOMXPath($this->domPage); $names = $domXpath->query("//td[

unable to scrape content from a website

阅读更多关于 unable to scrape content from a website

问题 I am trying to scrap some content from a website but the code below is not working(not showing any output). here is the code $url="some url"; $otherHeaders=""; //here i am using some other headers like content-type,userAgent,etc some curl to get the webpage ... .. curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); $content=curl_exec($ch);curl_close($ch); $page=new DOMDocument(); $xpath=new DOMXPath($page); $content=getXHTML($content); //this is a tidy function to convert bad html to xhtml $page-