simple-html-dom | 易学教程

scrap data using regex and simplehtmldom

阅读更多关于 scrap data using regex and simplehtmldom

问题 i am trying to scrap some data from this site : http://laperuanavegana.wordpress.com/ . actually i want the title of recipe and ingredients . ingredients is located inside two specific keyword . i am trying to get this data using regex and simplehtmldom . but its showing the full html text not just the ingredients . here is my code : include_once('simple_html_dom.php'); $base_url = "http://laperuanavegana.wordpress.com/"; traverse($base_url); function traverse($base_url) { $html = file_get

Simple HTML DOM returning false

阅读更多关于 Simple HTML DOM returning false

问题 I've encountered something strange when using Simple HTML DOM to parse a webpage with a certain query string. Some query strings work when trying to parse this used car page of a dealership's website, however others do not. It seems to be that whenever there are more vehicles to be shown on the page, it will not return the HTML content (meaning if we are on the last page of pagination it will work, otherwise it won't). Just wondering if anyone has any ideas. I've tried viewing the page with

Google search results with php

阅读更多关于 Google search results with php

问题 I'm using the following php script to get search results from Google. include("simple_html_dom.php"); include("random-user-agent.php"); $query = 'facebook'; $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, 'http://www.google.com/search?q='.$query.''); #curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($curl, CURLOPT_USERAGENT,random_user_agent()); $str = curl_exec($curl); curl_close($curl); $html= str_get_html($str); $i = 0;

simple html dom returning string instead of array

阅读更多关于 simple html dom returning string instead of array

问题 I have the following code $html3->find('TR[d=lt]',0); for the following source code <TR> <TH NOWRAP ALIGN=RIGHT VALIGN=TOP> Date:</TH> <TD d="lt"> 2011-05-31 </TD> </TR> <TR> <TH NOWRAP ALIGN=RIGHT VALIGN=TOP>Title:</TH> <TD d="lt"> NETWORKS</TD> </TR> <TR> <TH NOWRAP ALIGN=RIGHT VALIGN=TOP>Title:</TH> <TD d="lt"> Low NETWORKS</TD> </TR> <TR> <TH NOWRAP ALIGN=RIGHT VALIGN=TOP>Description:</TH> <TD d="lt"> CD</TD> </TR> however the code only returns as an string the DATE instead of the an

PHP Simple HTML DOM Parser Call to a member function children() on a non-object

阅读更多关于 PHP Simple HTML DOM Parser Call to a member function children() on a non-object

问题 I'm pretty new to PHP, so i try to use Simple HTML DOM Parser to get the information i need from a website. here is the sample node: <div> <h1>Hello</h1> <figure id="XXX"> <div class="abc">ABC</div> <div class="qwe">QWE</div> <div class="zxc">ZXC</div> </figure> </div> $element contain above node. What i need to get is the "QWE", so i try: $name = $element->find('figure[id=XXX]')->children(1)->innertext; but now the problem: Fatal error: Call to a member function children() on a non-object ,

PHP Simple HTML DOM - Get text inside <td> tag

阅读更多关于 PHP Simple HTML DOM - Get text inside tag

问题 So I have this .html file that I have to analyze. In that file I have lines like this one: <tr> <td colspan=1 rowspan=1 bgcolor=#ffffff align=left valign=top> <font size=1 face="Tahoma" color=#000000> <nobr> 240,0000 </nobr> </font> </td> <td colspan=1 rowspan=1 bgcolor=#ffffff align=left valign=top> <font size=1 face="Tahoma" color=#000000> <nobr> 340,0000 </nobr> </font> </td> </tr> What I need to get is 240,0000 , 340,0000 and so on. I have tried something like this: // Create DOM from URL

Get language of a website using simple html dom

阅读更多关于 Get language of a website using simple html dom

问题 I am building a search engine and webcrawler using PHP, and i would like to detect the language of a website, how would i detect the language of a page by: Checking the URL for https://twitter.com/?lang=jap if that is not set then i would like to: Check the URL https://www.google.co.jp/ if i still can't find anything then i would to set default to English the code i have so far for scraping pages is: function crawl($url){ $html = file_get_html($url); if($html && is_object($html) && isset(

Get all images and return the src [duplicate]

阅读更多关于 Get all images and return the src [duplicate]

问题 This question already has answers here : How to extract img src, title and alt from html using php? [duplicate] (10 answers) Closed 4 years ago . My code bellow grabs some content. in this content there are some photos. How can i loop through this content find all images and return their src? my code so far: $items = $html->find('div.post-single-content',0)->children(1)->outertext; foreach($items $node) { $node->find('img'); } print_r ($node); 回答1: Don't use regex, use a parser. Example:

Unable to print links in another function

阅读更多关于 Unable to print links in another function

问题 I've written some code in php to scrape some preferable links out of the main page of wikipedia. When I execute my script, the links are coming through accordingly. However, at this point I've defined two functions within my script in order to learn how to pass links from one function to another. Now, my goal is to print the links in the latter function but it only prints the first link and nothing else. If I use only this function fetch_wiki_links() , I can get several links but when i try

Authorize with curl and parse using simple html dom not working

阅读更多关于 Authorize with curl and parse using simple html dom not working

问题 I'm trying to read a html page using simple html dom for which an login authorization is needed. for example: http://example.com/login/ is the login page and http://example.com/page/ is where i should parse the html. So i used curl to do the login and simple html dom to parse. But i dont know whether my page login or not, because when i display the response from curl its the login page contents!! I searched through stack in allmost all related questions for many hours but i couldnt find what