How to scrape specific data from scrape with simple html dom parser

前端 未结 6 1027
北恋
北恋 2020-12-18 01:21

I am trying to scrape the datas from a webpage, but I get need to get all the data in this link.

include \'simple_html_dom.php\';
$html1 = file_get_html(\'ht         


        
相关标签:
6条回答
  • 2020-12-18 01:50

    From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.

    foreach ($html->find('#content dl') as $item) {
         $info = $item->find('dd');
         foreach ($info as $info_item) {..}
    }
    

    Using the simple_html_dom library

    0 讨论(0)
  • 2020-12-18 01:57

    @zero: there is good site to try out scrapping a site using both php and python...pretty helpful site atleast to me:- http://scraperwiki.com/

    0 讨论(0)
  • 2020-12-18 02:03

    Seems to be written in the documentation:

    $html1->find('b[class=info]',0)->innertext;
    
    0 讨论(0)
  • 2020-12-18 02:08

    XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:

    //div[id='content']/d1/dt
    

    A simple Google search will give you plenty of tutorials

    0 讨论(0)
  • 2020-12-18 02:09

    I'd use WWW:Mechanize

    http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm

    0 讨论(0)
  • 2020-12-18 02:17

    Your provided links are down, I will suggest you to use the native PHP "DOM" Extension instead of "simple html parser", it will be much faster and easier ;) I had a look at the page using googlecache, you can use something like:-

    $doc = new DOMDocument;
    @$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
    $contents = $doc->getElementById('content')->nodeValue; // Text contents of #content
    
    0 讨论(0)
提交回复
热议问题