cURL returns 404 while the page is found in browser

前端 未结 4 778
误落风尘
误落风尘 2020-12-06 19:13

there is already similar questions on stackoverflow, but none of their solutions have been working for me. I\'m trying to grab a page on LoveIt.com with cURL, but it return

相关标签:
4条回答
  • 2020-12-06 19:42

    I just had a similar issue with a site. In my case they were expecting a USER_AGENT to be set so anyone with this issue in the future should also check that.

    0 讨论(0)
  • 2020-12-06 19:45

    You don't need to save the cookie file via chrome.

    You can create a function to get this cookie, and then reuse it.

    Like:

    <?php
    
    error_reporting(E_ALL);
    
    Class Crawler{
    
       var $cookie;
       var $http_response;
       var $user_agent;
    
       function __construct($cookie){
           $this->cookie     = (string) $cookie;
           $this->user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0'; 
       }
    
       function get($url){
           $ch = curl_init();
           curl_setopt($ch, CURLOPT_URL, $this->url);
           curl_setopt($ch, CURLOPT_NOBODY, 1);
           curl_setopt($ch, CURLOPT_USERAGENT, $this->user_agent);
           // Here we create the file with cookies
           curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie);
           $this->http_response = curl_exec($ch);
       }
    
       function get_with_cookies($url){
           $ch = curl_init();
           curl_setopt($ch, CURLOPT_URL, $url);
           curl_setopt($ch, CURLOPT_NOBODY, 1);
           curl_setopt($ch, CURLOPT_USERAGENT, $this->user_agent);
           curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie);
    
           // Here we can re-use the cookie file keeping the save of the cookies 
           curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookie);
           $this->http_response = curl_exec($ch);
        }
    }
    
    $crawler = new Crawler('cookie_file_name');
    // Creating cookie file
    $crawler->get('uri');
    // Request with the cookies
    $crawler->get_with_cookies('uri');
    

    Regards.

    0 讨论(0)
  • 2020-12-06 19:51

    I quickly checked the said page with LiveHeaders enabled and I noticed bunch of cookies set. I suspect that, since it's not "normal" url, you need to hand those cookies while being redirected otherwise you end being kicked out with 404. Use CURLOPT_COOKIEJAR with your cURL instance at start. See: http://php.net/manual/pl/function.curl-setopt.php

    0 讨论(0)
  • 2020-12-06 19:57

    Thanks for your answer, so I did visit the page, saved the cookies in a cookies.txt file (with chrome extenson cookie.txt export) that I use NOT CURLOPT_COOKIEJAR, but for option CURLOPT_COOKIEFILE.

    $cookiefile = './cookie.txt';
    
    curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile);
    

    and now it works ! Thanks for your feedback, it was really useful.

    0 讨论(0)
提交回复
热议问题