domxpath | 易学教程

Remove <p><br/></p> with DOMxpath or regex?

阅读更多关于 Remove with DOMxpath or regex?

问题 I use DOMxpath to remove html tags that have empty text node but to keep <br/> tags, $xpath = new DOMXPath($dom); while(($nodeList = $xpath->query('//*[not(text()) and not(node()) and not(self::br)]')) && $nodeList->length > 0) { foreach ($nodeList as $node) { $node->parentNode->removeChild($node); } } it works perfectly until I came across another problem, $content = '<p><br/><br/><br/><br/></p>'; How do remove this kind of messy <br/> and <p> ? which means I don't want to allow <br/> alone

creating preg_match using xpath in php

阅读更多关于 creating preg_match using xpath in php

问题 I am trying to get the contents using XPATH in php. <div class='post-body entry-content' id='post-body-37'> <div style="text-align: left;"> <div style="text-align: center;"> Hi </div></div></div> I am using below php code to get the output. $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTML($html); $xpath = new DOMXPath($dom); $xpath->registerPhpFunctions('preg_match'); $regex = 'post-(content|[a-z]+)'; $items = $xpath->query("div[ php:functionString('preg_match', '

How to retrieve node number based on parent node dynamically from xsd file using PHP

阅读更多关于 How to retrieve node number based on parent node dynamically from xsd file using PHP

问题 I took the tag names from xsd file and also stored into database but am not able to assign the reference number based on the parent node using php. my XSD is sample.xsd <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element

xpath: extract data from a node using xpath

阅读更多关于 xpath: extract data from a node using xpath

问题 I want to extract only the sales rank (which in this case is 5) Amazon Best Sellers Rank: #5 in Books ( See Top 100 in Books ) From web page : http://www.amazon.com/Mockingjay-Hunger-Games-Book-3/dp/0439023513/ref=tmm_hrd_title_0 So far I have gotten down to this, which selects "Amazon Best Sellers Rank:": //li[@id='SalesRank']/b/text() I am using PHP DOMDocument and DOMXPath . 回答1: You can use pure XPath: substring-before(normalize-space(/html/body//ul/li[@id="SalesRank"]/b[1]/following

PHP change DOM useragent

阅读更多关于 PHP change DOM useragent

问题 I have this simple code to get the title of any page <?php $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.facebook.com'); $xpath = new DOMXPath($doc); echo $xpath->query('//title')->item(0)->nodeValue."\n"; ?> It is working fine on all pages that I have tried but not in Facebook. When I try in Facebook it is not showing Welcome to Facebook - Log In, Sign Up or Learn More , but it is showing Update Your Browser | Facebook . I think there is a problem with useragent. So is there a

How to scrape a javascript site using PHP, CURL [duplicate]

阅读更多关于 How to scrape a javascript site using PHP, CURL [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: How do I render javascript from another site, inside a PHP application? This is the site http://www.oferta.pl/strona_v2/gazeta_v2/ . This site is built totally on JavaScript. I want to scrape using PHP and curl. Currently I use DOMXPath. In the left menu there are some category to be selected. I see no 'form' there. How can I use curl to submit that form and scrap the output page? I have used file_get_contents()

PHP xpath contains class and does not contain class

阅读更多关于 PHP xpath contains class and does not contain class

问题 The title sums it up. I'm trying to query an HTML file for all div tags that contain the class result and does not contain the class grid . <div class="result grid">skip this div</div> <div class="result">grab this one</div> Thanks! 回答1: This should do it: <?php $doc = new DOMDocument(); $doc->loadHTMLFile('test.html'); $xpath = new DOMXPath($doc); $nodeList = $xpath->query( "//div[contains(@class, 'result') and not(contains(@class, 'grid'))]"); foreach ($nodeList as $node) { echo $node-

DOMXpath - Get href attribute and text value of an a element

阅读更多关于 DOMXpath - Get href attribute and text value of an a element

问题 So I have a HTML string like this: <td class="name"> <a href="/blah/somename23123">Some Name</a> </td> <td class="name"> <a href="/blah/somename28787">Some Name2</a> </td> Using XPath I'm able to get value of href attribute using this Xpath query: $domXpath = new \DOMXPath($this->domPage); $hrefs = $domXpath->query("//td[@class='name']/a/@href"); foreach($hrefs as $href) {...} And It's even easier to get a text value, like this: // Xpath auto. strips any html tags so we are // left with clean

find redirect META in DOMDocument with DOMXPath

阅读更多关于 find redirect META in DOMDocument with DOMXPath

问题 I have the following HTML: <html> <head> <meta http-equiv="refresh" content="0;URL=http://amazingjokes.com" /> </head> </html> I want to find the META with the redirect, so I wrote the following XPath query: /html/head/meta[@http-equiv="refresh"] However, the '-' in 'http-equiv' is causing an error: Invalid regular expression: //html/head/meta[@http-equiv="refresh"]/: Range out of order in character class How can I properly rewrite the xpath query to be able to find the meta redirect? I

Xpath Table Within Table

阅读更多关于 Xpath Table Within Table

问题 I am having a bit of a problem of scraping a table-heavy page with DOMXpath. The layout is really ugly, meaning I am trying to get content out of a table within a table within a table. Using Firebug FirePath I am getting for the table element the following path: html/body/table/tbody/tr[3]/td/table[1]/tbody/tr[2]/td[1]/table[1]/tbody/tr[3]/td[4] Now, after endless experimenting I found out, that with a stand alone table, I need to remove the "tbody" tag to make it work. But this doesn't seem