问题
Im ising this script to scrape a website:
<?php
$url = "http://www.nu.nl";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>
The output ads the wrong domain in javascript,css files in the head section. So I tried to fix it with:
$url = preg_replace("/<head>/i", "<head><base href='$url' />", $url, 1);
Doesn't work, any ideas why? I can't spot anything.
Example
回答1:
What about using the right variables? $curl_scraped_page
is your page and $url
your url... But you passed $url
to preg_replace
.
$curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);
来源:https://stackoverflow.com/questions/16065036/curl-and-relative-path-in-head