PHP: cURL and keep track of all redirections

后端 未结 3 1181
故里飘歌
故里飘歌 2020-12-16 04:17

I\'m looking to cURL a URL and keep track of each individual URL it goes through. For some reason I am unable to accomplish this without doing recursive cURL calls which is

相关标签:
3条回答
  • 2020-12-16 04:28

    With libcurl, you can use the CURLINFO_REDIRECT_URL getinfo variable to find out the URL it would have redirected to if it was enabled. This allows programs to easily traverse the redirects themselves.

    This approach is much better and easier than the parsing of Location: headers the others have suggested here, as then your code must rebuild relative paths etc. CURLINFO_REDIRECT_URL fixes that for you automatically.

    The PHP/CURL binding added support for this feature in PHP 5.3.7:

    $url = curl_getinfo($ch, CURLINFO_REDIRECT_URL)
    

    The commit that fixed this:

    https://github.com/php/php-src/commit/689268a0ba4259c8f199cae6343b3d17cab9b6a5

    0 讨论(0)
  • 2020-12-16 04:52

    You have

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    

    This means that cURL will follow redirects and return you only the final page with no Location header.

    To follow location manually:

    function getWebPage($url, $redirectcallback = null){
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, false);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 BonEcho/2.0");
    
        $html = curl_exec($ch);
        $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        if ($http_code == 301 || $http_code == 302) {
            list($httpheader) = explode("\r\n\r\n", $html, 2);
            $matches = array();
            preg_match('/(Location:|URI:)(.*?)\n/', $httpheader, $matches);
            $nurl = trim(array_pop($matches));
            $url_parsed = parse_url($nurl);
            if (isset($url_parsed)) {
                if($redirectcallback){ // callback
                     $redirectcallback($nurl, $url);
                }
                $html = getWebPage($nurl, $redirectcallback);
            }
        }
        return $html;
    }
    
    function trackAllLocations($newUrl, $currentUrl){
        echo $currentUrl.' ---> '.$newUrl."\r\n";
    }
    
    getWebPage('some url with redirects', 'trackAllLocations');
    
    0 讨论(0)
  • 2020-12-16 04:52

    May I make a recommendation...

     preg_match('/(Location:|URI:)(.*?)\n/', $httpheader, $matches);
    

    change the regex to /(Location:|URI:)(.*?)\n/i so it's case insensitive. I noticed there are some sites/places that are using location: where the L is lower case.

    Just a thought to help those that wondered why sometimes it's not working... look into that.

    0 讨论(0)
提交回复
热议问题