Web crawler links/page logic in PHP
问题 I'm writing a basic crawler that simply caches pages with PHP. All it does is use get_file_contents to get contents of a webpage and regex to get all the links out <a href="URL">DESCRIPTION</a> - at the moment it returns: Array { [url] => URL [desc] => DESCRIPTION } The problem I'm having is figuring out the logic behind determining whether the page link is local or sussing out whether it may be in a completely different local directory. It could be any number of combinations: i.e. href="..