How to pass Age Verification with DOM

前端 未结 2 1166
天命终不由人
天命终不由人 2021-01-19 02:19

I\'m attempting to pull some image URLs from Steam store pages, such as: http://store.steampowered.com/app/35700/
http://store.steampowered.com/app/252490/

Her

相关标签:
2条回答
  • 2021-01-19 02:27

    Solved! Here's the working code:

    $url = 'http://store.steampowered.com/app/35700/';
    
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_COOKIE, "birthtime=28801; path=/; domain=store.steampowered.com");
    curl_setopt($ch, CURLOPT_TIMEOUT, 5); 
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    
    $result = curl_exec($ch);
    
    $dom = new domDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($result);
    $dom->preserveWhiteSpace = false;
    
    $images = $dom->getElementsByTagName('img');
    foreach ($images as $image) {
      $src = $image->getAttribute('src');
      echo $src.PHP_EOL;
    }
    
    curl_close($ch);
    
    0 讨论(0)
  • 2021-01-19 02:34

    You were looking for php answers, but I was trying to do the same thing in python and this was the most relevant question. Your php answer helped me out so maybe a python solution will help someone. My solution using python-requests in Python 2.7:

        import requests
    
        url = 'http://store.steampowered.com/app/252490/'
        cookie = {
                'birthtime' : '28801',
                'path' : '/',
                'domain' : 'store.steampowered.com'
                }
    
        r = requests.get(url, cookies=cookie)
        assert (r.status_code == 200 and r.text.find('Please enter your birth date to continue') < 0), ("Failed to retrieve page for {url}. Error={code}.".format(url=url, code=r.status_code))
    
        print r.text.encode('utf-8')
    
    0 讨论(0)
提交回复
热议问题