domxpath

PHP xpath query on XML with default namespace binding

蓝咒 提交于 2019-11-30 17:16:02
问题 I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this. Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is: ./xpeg "//MainType[@ID=123]" What seems most strange is this line, without which my approach doesn’t work: $result->loadXML($result->saveXML($result)); As far as I know, this simply re-parses the modified XML, and it seems to me that this

PHP xpath contains class and does not contain class

巧了我就是萌 提交于 2019-11-30 04:44:38
The title sums it up. I'm trying to query an HTML file for all div tags that contain the class result and does not contain the class grid . <div class="result grid">skip this div</div> <div class="result">grab this one</div> Thanks! This should do it: <?php $doc = new DOMDocument(); $doc->loadHTMLFile('test.html'); $xpath = new DOMXPath($doc); $nodeList = $xpath->query( "//div[contains(@class, 'result') and not(contains(@class, 'grid'))]"); foreach ($nodeList as $node) { echo $node->nodeName . "\n"; } Your XPath would be //div[contains(concat(' ', @class, ' '), ' result ') and not(contains

DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

风格不统一 提交于 2019-11-29 14:44:06
I've building a command line php scraping app that uses XPath to analyze the HTML - the problem is every time a new DOMXPath class instance gets loaded in a loop I'm getting a memory loss roughly equal to the size of the XML being loaded. The script runs and runs, slowly building up memory usage until it hits the limit and quits. I've tried forcing garbage collection with gc_collect_cycles() and PHP still isn't getting back memory from old Xpath requests. Indeed the definition of the DOMXPath class doesn't seem to even include a destructor function? So my question is ... is there any way to

What is the difference between DOMXPath::evaluate and DOMXPath::query?

﹥>﹥吖頭↗ 提交于 2019-11-29 10:32:43
Trying to decide which is more appropriate for my use case... After comparing the documentation for these methods, my vague understanding is evaluate returns a typed result but query doesn't. Furthermore, the query example includes looping through many results but the evaluate example assumes a single typed result. Still not much the wiser! Could anyone explain (in as close as possible to layman's terms) when you would use one or the other - e.g. will the multiple/single results mentioned above always be the case? ThW DOMXPath::query() supports only expressions that return a node list.

DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class

只愿长相守 提交于 2019-11-28 08:41:47
问题 I've building a command line php scraping app that uses XPath to analyze the HTML - the problem is every time a new DOMXPath class instance gets loaded in a loop I'm getting a memory loss roughly equal to the size of the XML being loaded. The script runs and runs, slowly building up memory usage until it hits the limit and quits. I've tried forcing garbage collection with gc_collect_cycles() and PHP still isn't getting back memory from old Xpath requests. Indeed the definition of the DOMXPath

What is the difference between DOMXPath::evaluate and DOMXPath::query?

点点圈 提交于 2019-11-28 03:34:45
问题 Trying to decide which is more appropriate for my use case... After comparing the documentation for these methods, my vague understanding is evaluate returns a typed result but query doesn't. Furthermore, the query example includes looping through many results but the evaluate example assumes a single typed result. Still not much the wiser! Could anyone explain (in as close as possible to layman's terms) when you would use one or the other - e.g. will the multiple/single results mentioned

unable to scrape content from a website

微笑、不失礼 提交于 2019-11-28 00:14:54
I am trying to scrap some content from a website but the code below is not working(not showing any output). here is the code $url="some url"; $otherHeaders=""; //here i am using some other headers like content-type,userAgent,etc some curl to get the webpage ... .. curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); $content=curl_exec($ch);curl_close($ch); $page=new DOMDocument(); $xpath=new DOMXPath($page); $content=getXHTML($content); //this is a tidy function to convert bad html to xhtml $page->loadHTML($content); // its okay till here when i echo $page->saveHTML the page is displayed $path1="//body

DOMXpath - Get href attribute and text value of an a element

ⅰ亾dé卋堺 提交于 2019-11-27 04:38:04
So I have a HTML string like this: <td class="name"> <a href="/blah/somename23123">Some Name</a> </td> <td class="name"> <a href="/blah/somename28787">Some Name2</a> </td> Using XPath I'm able to get value of href attribute using this Xpath query: $domXpath = new \DOMXPath($this->domPage); $hrefs = $domXpath->query("//td[@class='name']/a/@href"); foreach($hrefs as $href) {...} And It's even easier to get a text value, like this: // Xpath auto. strips any html tags so we are // left with clean text value of a element $domXpath = new \DOMXPath($this->domPage); $names = $domXpath->query("//td[

unable to scrape content from a website

会有一股神秘感。 提交于 2019-11-26 21:39:30
问题 I am trying to scrap some content from a website but the code below is not working(not showing any output). here is the code $url="some url"; $otherHeaders=""; //here i am using some other headers like content-type,userAgent,etc some curl to get the webpage ... .. curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); $content=curl_exec($ch);curl_close($ch); $page=new DOMDocument(); $xpath=new DOMXPath($page); $content=getXHTML($content); //this is a tidy function to convert bad html to xhtml $page-