How to get final URL after following HTTP redirections in pure PHP?

后端 未结 5 2038
有刺的猬
有刺的猬 2020-11-29 03:41

What I\'d like to do is find out what is the last/final URL after following the redirections.

I would prefer not to use cURL. I would like t

5条回答
  •  盖世英雄少女心
    2020-11-29 04:08

    While the OP wanted to avoid cURL, it's best to use it when it's available. Here's a solution which has the following advantages

    • uses curl for all the heavy lifting, so works with https
    • copes with servers which return lower cased location header name (both xaav and webjay's answers do not handle this)
    • allows you to control how deep you want you go before giving up

    Here's the function:

    function findUltimateDestination($url, $maxRequests = 10)
    {
        $ch = curl_init();
    
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);
    
        //customize user agent if you desire...
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
    
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_exec($ch);
    
        $url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    
        curl_close ($ch);
        return $url;
    }
    

    Here's a more verbose version which allows you to inspect the redirection chain rather than let curl follow it.

    function findUltimateDestination($url, $maxRequests = 10)
    {
        $ch = curl_init();
    
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);
    
        //customize user agent if you desire...
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');
    
        while ($maxRequests--) {
    
            //fetch
            curl_setopt($ch, CURLOPT_URL, $url);
            $response = curl_exec($ch);
    
            //try to determine redirection url
            $location = '';
            if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
                if (preg_match('/Location:(.*)/i', $response, $match)) {
                    $location = trim($match[1]);
                }
            }
    
            if (empty($location)) {
                //we've reached the end of the chain...
                return $url;
            }
    
            //build next url
            if ($location[0] == '/') {
                $u = parse_url($url);
                $url = $u['scheme'] . '://' . $u['host'];
                if (isset($u['port'])) {
                    $url .= ':' . $u['port'];
                }
                $url .= $location;
            } else {
                $url = $location;
            }
        }
    
        return null;
    }
    

    As an example of redirection chain which this function handles, but the others do not, try this:

    echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')
    

    At the time of writing, this involves 4 requests, with a mixture of Location and location headers involved.

提交回复
热议问题