I would like to create a batch script, to go through 20,000 links in a DB, and weed out all the 404s and such. How would I get the HTTP status code for a remote url?
If im not mistaken none of the php built-in functions return the http status of a remote url, so the best option would be to use sockets to open a connection to the server, send a request and parse the response status:
pseudo code:
parse url => $host, $port, $path
$http_request = "GET $path HTTP/1.0\nHhost: $host\n\n";
$fp = fsockopen($host, $port, $errno, $errstr, $timeout), check for any errors
fwrite($fp, $request)
while (!feof($fp)) {
$headers .= fgets($fp, 4096);
$status =
if ()
break;
}
fclose($fp)
Another option is to use an already build http client class in php that can return the headers without fetching the full page content, there should be a few open source classes available on the net...