domxpath

Remove <p><br/></p> with DOMxpath or regex?

ⅰ亾dé卋堺 提交于 2019-12-24 02:25:25
问题 I use DOMxpath to remove html tags that have empty text node but to keep <br/> tags, $xpath = new DOMXPath($dom); while(($nodeList = $xpath->query('//*[not(text()) and not(node()) and not(self::br)]')) && $nodeList->length > 0) { foreach ($nodeList as $node) { $node->parentNode->removeChild($node); } } it works perfectly until I came across another problem, $content = '<p><br/><br/><br/><br/></p>'; How do remove this kind of messy <br/> and <p> ? which means I don't want to allow <br/> alone

creating preg_match using xpath in php

試著忘記壹切 提交于 2019-12-23 23:56:10
问题 I am trying to get the contents using XPATH in php. <div class='post-body entry-content' id='post-body-37'> <div style="text-align: left;"> <div style="text-align: center;"> Hi </div></div></div> I am using below php code to get the output. $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTML($html); $xpath = new DOMXPath($dom); $xpath->registerPhpFunctions('preg_match'); $regex = 'post-(content|[a-z]+)'; $items = $xpath->query("div[ php:functionString('preg_match', '

How to retrieve node number based on parent node dynamically from xsd file using PHP

不羁的心 提交于 2019-12-23 04:57:17
问题 I took the tag names from xsd file and also stored into database but am not able to assign the reference number based on the parent node using php. my XSD is sample.xsd <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element

xpath: extract data from a node using xpath

送分小仙女□ 提交于 2019-12-23 02:26:07
问题 I want to extract only the sales rank (which in this case is 5) Amazon Best Sellers Rank: #5 in Books ( See Top 100 in Books ) From web page : http://www.amazon.com/Mockingjay-Hunger-Games-Book-3/dp/0439023513/ref=tmm_hrd_title_0 So far I have gotten down to this, which selects "Amazon Best Sellers Rank:": //li[@id='SalesRank']/b/text() I am using PHP DOMDocument and DOMXPath . 回答1: You can use pure XPath: substring-before(normalize-space(/html/body//ul/li[@id="SalesRank"]/b[1]/following

PHP change DOM useragent

半腔热情 提交于 2019-12-22 00:23:58
问题 I have this simple code to get the title of any page <?php $doc = new DOMDocument(); @$doc->loadHTMLFile('http://www.facebook.com'); $xpath = new DOMXPath($doc); echo $xpath->query('//title')->item(0)->nodeValue."\n"; ?> It is working fine on all pages that I have tried but not in Facebook. When I try in Facebook it is not showing Welcome to Facebook - Log In, Sign Up or Learn More , but it is showing Update Your Browser | Facebook . I think there is a problem with useragent. So is there a

How to scrape a javascript site using PHP, CURL [duplicate]

独自空忆成欢 提交于 2019-12-20 07:59:19
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: How do I render javascript from another site, inside a PHP application? This is the site http://www.oferta.pl/strona_v2/gazeta_v2/ . This site is built totally on JavaScript. I want to scrape using PHP and curl. Currently I use DOMXPath. In the left menu there are some category to be selected. I see no 'form' there. How can I use curl to submit that form and scrap the output page? I have used file_get_contents()

PHP xpath contains class and does not contain class

北城以北 提交于 2019-12-18 11:45:27
问题 The title sums it up. I'm trying to query an HTML file for all div tags that contain the class result and does not contain the class grid . <div class="result grid">skip this div</div> <div class="result">grab this one</div> Thanks! 回答1: This should do it: <?php $doc = new DOMDocument(); $doc->loadHTMLFile('test.html'); $xpath = new DOMXPath($doc); $nodeList = $xpath->query( "//div[contains(@class, 'result') and not(contains(@class, 'grid'))]"); foreach ($nodeList as $node) { echo $node-

DOMXpath - Get href attribute and text value of an a element

三世轮回 提交于 2019-12-17 07:31:15
问题 So I have a HTML string like this: <td class="name"> <a href="/blah/somename23123">Some Name</a> </td> <td class="name"> <a href="/blah/somename28787">Some Name2</a> </td> Using XPath I'm able to get value of href attribute using this Xpath query: $domXpath = new \DOMXPath($this->domPage); $hrefs = $domXpath->query("//td[@class='name']/a/@href"); foreach($hrefs as $href) {...} And It's even easier to get a text value, like this: // Xpath auto. strips any html tags so we are // left with clean

find redirect META in DOMDocument with DOMXPath

和自甴很熟 提交于 2019-12-14 03:34:16
问题 I have the following HTML: <html> <head> <meta http-equiv="refresh" content="0;URL=http://amazingjokes.com" /> </head> </html> I want to find the META with the redirect, so I wrote the following XPath query: /html/head/meta[@http-equiv="refresh"] However, the '-' in 'http-equiv' is causing an error: Invalid regular expression: //html/head/meta[@http-equiv="refresh"]/: Range out of order in character class How can I properly rewrite the xpath query to be able to find the meta redirect? I

Xpath Table Within Table

我与影子孤独终老i 提交于 2019-12-13 03:06:56
问题 I am having a bit of a problem of scraping a table-heavy page with DOMXpath. The layout is really ugly, meaning I am trying to get content out of a table within a table within a table. Using Firebug FirePath I am getting for the table element the following path: html/body/table/tbody/tr[3]/td/table[1]/tbody/tr[2]/td[1]/table[1]/tbody/tr[3]/td[4] Now, after endless experimenting I found out, that with a stand alone table, I need to remove the "tbody" tag to make it work. But this doesn't seem