domxpath | 易学教程

Too long xpath with DOMXpath query/evaluate return nothing

阅读更多关于 Too long xpath with DOMXpath query/evaluate return nothing

问题 I am using PHP to retrieve content for a given URL and XPATH. I use DOMDocument / DOMXPath (with query or evaluate). For small xpath, I obtain correct result, but for longer xpath, it does not work. (And this xpath seems to be good (I obtained them with Xpather (firefox plugin) and re-test them with YQL). Do you have any advice on this curious trouble ? Example of code: $doc = new DOMDocument(); $myXMLString = file_get_contents('http://stackoverflow.com/questions/4097230/too-long-xpath-with

Replace all images in HTML with text

阅读更多关于 Replace all images in HTML with text

问题 I am trying to replace all images in some HTML which meet specific requirements with the appropriate text. The specific requirements are that they are of class "replaceMe" and the image src filename is in $myArray. Upon searching for solutions, it appears that some sort of PHP DOM technique is appropriate, however, I am very new with this. For instance, given $html, I wish to return $desired_html. At the bottom of this post is my attempted implementation which currently doesn't work. Thank

DOMXPath object value omitted

阅读更多关于 DOMXPath object value omitted

问题 I read many stackoverflow question and I'm using this code but I don't know why this is not work. Here is a code. $url = 'http://m.cricbuzz.com/cricket-schedule'; $source = file_get_contents($url); $doc = new DOMDocument; @$doc->loadHTML($source); $xpath = new DOMXPath($doc); $classname = "list-group"; $events = $xpath->query("//*[contains(@class, '$classname')]"); var_dump($xpath); Can you please check it why this is not working actually I want to get data from list-group 回答1: The code is

XPATH not working on the HTML

阅读更多关于 XPATH not working on the HTML

问题 I have a code that reads an HTML file from my local web server localhost and then converts it to XHTML with tidy . Then i load that XHTML into my DOM . the code looks like this <?php function getXHTML($html) { $options = array("output-html" => true,"quote-nbsp" => true, "drop-proprietary-attributes" => true,"drop-font-tags" => true,"drop-empty-paras" => true,"hide-comments" => true); $tidy=new tidy(); $xhtml=$tidy->repairString($html,$options); echo $xhtml; return $xhtml; } $content = file

DOMXPath - Not expected behaviour as in the test. Object is not being wrapped. How to solve it?

阅读更多关于 DOMXPath - Not expected behaviour as in the test. Object is not being wrapped. How to solve it?

问题 This is the body of the article I want to manipulate with DOMXPath : That is the code I am using in order to encapsulate the <figure tag: $dom_err = libxml_use_internal_errors(true); $dom = new \DOMDocument('1.0', 'utf-8'); $dom->loadHtml(mb_convert_encoding($arr['body_article'], 'HTML-ENTITIES', 'UTF-8')); $xpath = new \DOMXPath($dom); //dd($xpath); foreach ($xpath->query("//figure") as $img) { $p = $dom->createElement("p"); $p->setAttribute('style', 'text-align:center'); $img->parentNode-

DOMXpath | Select the innermost divs

阅读更多关于 DOMXpath | Select the innermost divs

问题 Im looking for a way to select the innermost div with PHP for example: <div> <div> <div> - </div> </div> <div> <div> <div> - </div> </div> </div> </div> The DIV 's containing the - would be selected in the NodeList Im using DOMDocument and DOMXpath to go threw the html, heres and example of what one of my methods so you can see the way my class is created. public function getkeywords() { foreach($this->Xpath->query('/html/head/meta[@content][@name="keywords"][1]') as $node) { $words = $node-

How can I remove <br/> if no text comes before or after it? DOMxpath or regex?

阅读更多关于 How can I remove if no text comes before or after it? DOMxpath or regex?

问题 How can I remove <br/> if no text comes before or after it? For instance, <p><br/>hello</p> <p>hello<br/></p> they should be rewritten like this, <p>hello</p> <p>hello</p> Should I use DOMxpath or regex would be better? (Note: I have a post about removing <p><br/></p> with DOMxpath earlier, and then I came across this issue!) EDIT: If I have this in the input, $content = '<p><br/>hello<br/>hello<br/></p>'; then it should be <p>hello<br/>hello</p>' 回答1: To select the mentioned br you can use:

How to convert XML (String) to a valid document?

阅读更多关于 How to convert XML (String) to a valid document?

问题 I have XML as a string and i want to convert it to DOM document in order to parse it using XPath, i use this code to convert one String element to DOM element: public Element convert(String xml) throws ParserConfigurationException, SAXException, IOException{ Element sXml = DocumentBuilderFactory .newInstance() .newDocumentBuilder() .parse(new ByteArrayInputStream(xml.getBytes())) .getDocumentElement(); return sXml; } but what if i want to convert a whole XML file?? i tried casting but it didn

How do I account for missing xPaths and keep my data uniform when scraping a website using DOMXPath query method?

阅读更多关于 How do I account for missing xPaths and keep my data uniform when scraping a website using DOMXPath query method?

问题 I am attempting to scrape a website using the DOMXPath query method. I have successfully scraped the 20 profile URLs of each News Anchor from this page. $url = "http://www.sandiego6.com/about-us/meet-our-team"; $xPath = "//p[@class='bio']/a/@href"; $html = new DOMDocument(); @$html->loadHtmlFile($url); $xpath = new DOMXPath( $html ); $nodelist = $xpath->query($xPath); $profileurl = array(); foreach ($nodelist as $n){ $value = $n->nodeValue; $profileurl[] = $value; } I used the resulting array

PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article?

阅读更多关于 PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article?

问题 I’m trying to use Wikipedia’s API (api.php) to get the content of a Wikipedia article provided by a link (like: http://en.wikipedia.org/wiki/Stackoverflow). And what I want is to get the first paragraph (which in the example of the Stackoverflow wiki article is: Stack Overflow is a website part of the Stack Exchange network[2][3] featuring questions and answers on a wide range of topics in computer programming.[4][5][6] ). I’m going to do some data manipulation with it. I’ve tried with the