Curl and relative path in <head>

你。 提交于 2019-12-24 19:04:08

问题


Im ising this script to scrape a website:

<?php
$url = "http://www.nu.nl";

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);

echo $curl_scraped_page;
?>

The output ads the wrong domain in javascript,css files in the head section. So I tried to fix it with:

$url = preg_replace("/<head>/i", "<head><base href='$url' />", $url, 1);

Doesn't work, any ideas why? I can't spot anything.

Example


回答1:


What about using the right variables? $curl_scraped_page is your page and $url your url... But you passed $url to preg_replace.

$curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);


来源:https://stackoverflow.com/questions/16065036/curl-and-relative-path-in-head

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!