I am using this library (PHP Simple HTML DOM parser) to parse a link, here\'s the code:
function getSemanticRelevantKeywords($keyword){
$results = array(
Before file_get_html/load_file
method, you should first check if URL exists or not.
If the URL exists, you pass one step.
(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)
If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set.
That's the code i used in my pages.
function url_exists($url){
if ((strpos($url, "http")) === false) $url = "http://" . $url;
$headers = @get_headers($url);
// print_r($headers);
if (is_array($headers)){
if(strpos($headers[0], '404 Not Found'))
return false;
else
return true;
}
else
return false;
}
$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
$htmlPage->load_file( $pageAddress );
} else {
echo 'url doesn t exist, i stop';
return;
}
if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
// do your work here...
} else {
echo 'fetched page is not ok, i stop';
return;
}