I am trying to scrape the datas from a webpage, but I get need to get all the data in this link.
include \'simple_html_dom.php\';
$html1 = file_get_html(\'ht
From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.
foreach ($html->find('#content dl') as $item) {
$info = $item->find('dd');
foreach ($info as $info_item) {..}
}
Using the simple_html_dom library
@zero: there is good site to try out scrapping a site using both php and python...pretty helpful site atleast to me:- http://scraperwiki.com/
Seems to be written in the documentation:
$html1->find('b[class=info]',0)->innertext;
XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:
//div[id='content']/d1/dt
A simple Google search will give you plenty of tutorials
I'd use WWW:Mechanize
http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
Your provided links are down, I will suggest you to use the native PHP "DOM" Extension instead of "simple html parser", it will be much faster and easier ;) I had a look at the page using googlecache, you can use something like:-
$doc = new DOMDocument;
@$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
$contents = $doc->getElementById('content')->nodeValue; // Text contents of #content