domxpath

Too long xpath with DOMXpath query/evaluate return nothing

旧街凉风 提交于 2019-12-13 03:06:09
问题 I am using PHP to retrieve content for a given URL and XPATH. I use DOMDocument / DOMXPath (with query or evaluate). For small xpath, I obtain correct result, but for longer xpath, it does not work. (And this xpath seems to be good (I obtained them with Xpather (firefox plugin) and re-test them with YQL). Do you have any advice on this curious trouble ? Example of code: $doc = new DOMDocument(); $myXMLString = file_get_contents('http://stackoverflow.com/questions/4097230/too-long-xpath-with

Replace all images in HTML with text

一曲冷凌霜 提交于 2019-12-12 21:09:26
问题 I am trying to replace all images in some HTML which meet specific requirements with the appropriate text. The specific requirements are that they are of class "replaceMe" and the image src filename is in $myArray. Upon searching for solutions, it appears that some sort of PHP DOM technique is appropriate, however, I am very new with this. For instance, given $html, I wish to return $desired_html. At the bottom of this post is my attempted implementation which currently doesn't work. Thank

DOMXPath object value omitted

做~自己de王妃 提交于 2019-12-12 03:55:39
问题 I read many stackoverflow question and I'm using this code but I don't know why this is not work. Here is a code. $url = 'http://m.cricbuzz.com/cricket-schedule'; $source = file_get_contents($url); $doc = new DOMDocument; @$doc->loadHTML($source); $xpath = new DOMXPath($doc); $classname = "list-group"; $events = $xpath->query("//*[contains(@class, '$classname')]"); var_dump($xpath); Can you please check it why this is not working actually I want to get data from list-group 回答1: The code is

XPATH not working on the HTML

别来无恙 提交于 2019-12-11 19:24:01
问题 I have a code that reads an HTML file from my local web server localhost and then converts it to XHTML with tidy . Then i load that XHTML into my DOM . the code looks like this <?php function getXHTML($html) { $options = array("output-html" => true,"quote-nbsp" => true, "drop-proprietary-attributes" => true,"drop-font-tags" => true,"drop-empty-paras" => true,"hide-comments" => true); $tidy=new tidy(); $xhtml=$tidy->repairString($html,$options); echo $xhtml; return $xhtml; } $content = file

DOMXPath - Not expected behaviour as in the test. Object is not being wrapped. How to solve it?

梦想与她 提交于 2019-12-11 14:30:37
问题 This is the body of the article I want to manipulate with DOMXPath : That is the code I am using in order to encapsulate the <figure tag: $dom_err = libxml_use_internal_errors(true); $dom = new \DOMDocument('1.0', 'utf-8'); $dom->loadHtml(mb_convert_encoding($arr['body_article'], 'HTML-ENTITIES', 'UTF-8')); $xpath = new \DOMXPath($dom); //dd($xpath); foreach ($xpath->query("//figure") as $img) { $p = $dom->createElement("p"); $p->setAttribute('style', 'text-align:center'); $img->parentNode-

DOMXpath | Select the innermost divs

主宰稳场 提交于 2019-12-10 14:33:48
问题 Im looking for a way to select the innermost div with PHP for example: <div> <div> <div> - </div> </div> <div> <div> <div> - </div> </div> </div> </div> The DIV 's containing the - would be selected in the NodeList Im using DOMDocument and DOMXpath to go threw the html, heres and example of what one of my methods so you can see the way my class is created. public function getkeywords() { foreach($this->Xpath->query('/html/head/meta[@content][@name="keywords"][1]') as $node) { $words = $node-

How can I remove <br/> if no text comes before or after it? DOMxpath or regex?

人盡茶涼 提交于 2019-12-10 13:45:15
问题 How can I remove <br/> if no text comes before or after it? For instance, <p><br/>hello</p> <p>hello<br/></p> they should be rewritten like this, <p>hello</p> <p>hello</p> Should I use DOMxpath or regex would be better? (Note: I have a post about removing <p><br/></p> with DOMxpath earlier, and then I came across this issue!) EDIT: If I have this in the input, $content = '<p><br/>hello<br/>hello<br/></p>'; then it should be <p>hello<br/>hello</p>' 回答1: To select the mentioned br you can use:

How to convert XML (String) to a valid document?

≡放荡痞女 提交于 2019-12-10 11:48:42
问题 I have XML as a string and i want to convert it to DOM document in order to parse it using XPath, i use this code to convert one String element to DOM element: public Element convert(String xml) throws ParserConfigurationException, SAXException, IOException{ Element sXml = DocumentBuilderFactory .newInstance() .newDocumentBuilder() .parse(new ByteArrayInputStream(xml.getBytes())) .getDocumentElement(); return sXml; } but what if i want to convert a whole XML file?? i tried casting but it didn

How do I account for missing xPaths and keep my data uniform when scraping a website using DOMXPath query method?

故事扮演 提交于 2019-12-08 10:36:18
问题 I am attempting to scrape a website using the DOMXPath query method. I have successfully scraped the 20 profile URLs of each News Anchor from this page. $url = "http://www.sandiego6.com/about-us/meet-our-team"; $xPath = "//p[@class='bio']/a/@href"; $html = new DOMDocument(); @$html->loadHtmlFile($url); $xpath = new DOMXPath( $html ); $nodelist = $xpath->query($xPath); $profileurl = array(); foreach ($nodelist as $n){ $value = $n->nodeValue; $profileurl[] = $value; } I used the resulting array

PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article?

只谈情不闲聊 提交于 2019-12-07 19:36:49
问题 I’m trying to use Wikipedia’s API (api.php) to get the content of a Wikipedia article provided by a link (like: http://en.wikipedia.org/wiki/Stackoverflow). And what I want is to get the first paragraph (which in the example of the Stackoverflow wiki article is: Stack Overflow is a website part of the Stack Exchange network[2][3] featuring questions and answers on a wide range of topics in computer programming.[4][5][6] ). I’m going to do some data manipulation with it. I’ve tried with the