问题
I am writing a web crawler in php. Given a current URL, and an array of links to absolute, relative, and root URLs, how would I determine the fully-qualified URL for each link?
For example, I let's say I am crawling the URL:
http://www.example.com/path/to/my/file.html
And the array of links that the webpage contains is:
array(
'http://www.some-other-domain.com/',
'../../',
'/search',
);
How would I determine the fully-qualified URL for each of those links? The result I am looking for in this example would be, respectively:
http://www.some-other-domain.com/
http://www.example.com/path/
http://www.example.com/search/
回答1:
I think the easiest way is to use a library like this: http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/
Examples from the link:
url_to_absolute('http://www.example.com/sitemap.html', 'aboutus.html');
resolves to http://www.example.com/aboutus.html
or
url_to_absolute('http://www.example.com/content/sitemap.html', '../images/somephoto.jpg');
resolves to http://www.example.com/images/somephoto.jpg
来源:https://stackoverflow.com/questions/28314414/how-to-get-fully-qualified-url-from-anchor-href