Grab/download images from multiple pages using php preg_match_all & cURL

放肆的年华 提交于 2019-12-21 02:36:07

问题


So I'm trying to grab some images from another site, the problem is each image is on a different page

IE: id/1, id/2, id/3 etc etc

so far I have the code below which can grab an image from the single URL given using:

$returned_content = get_data('http://somedomain.com/id/1/');

but need to make the line above become an array (I guess) so it will grab the image from page 1 then go on to grab the next image on page 2 then page 3 etc etc automatically

function get_data($url){
 $ch = curl_init();
 $timeout = 5;
  curl_setopt($ch,CURLOPT_URL,$url);
  curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
  curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
  curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
 $data = curl_exec($ch);
  curl_close($ch);
 return $data;
}

$returned_content = get_data('http://somedomain.com/id/1/');

if (preg_match_all("~http://somedomain.com/images/(.*?)\.jpg~i", $returned_content, $matches)) {

$src = 0;
      foreach ($matches[1] as $key) {

if(++$src > 1) break;

          $out = $key;
      }

      $file = 'http://somedomain.com/images/' . $out . '.jpg';


$dir = 'photos'; 

$imgurl = get_data($file);

file_put_contents($dir . '/' . $out . '.jpg', $imgurl);

echo  'done';
}

As always all help is appreciated and thanks in advance.


回答1:


This was pretty confusing, because it sounded like you were only interested in saving one image per page. But then the code makes it look like you're actually trying to save every image on each page. So it's entirely possible I completely misunderstood... But here goes.

Looping over each page isn't that difficult:

$i = 1;
$l = 101;

while ($i < $l) {
    $html = get_data('http://somedomain.com/id/'.$i.'/');
    getImages($html);
    $i += 1;
}

The following then assumes that you're trying to save all the images on that particular page:

function getImages($html) {
    $matches = array();
    $regex = '~http://somedomain.com/images/(.*?)\.jpg~i';
    preg_match_all($regex, $html, $matches);
    foreach ($matches[1] as $img) {
        saveImg($img);
    }
}

function saveImg($name) {
    $url = 'http://somedomain.com/images/'.$name.'.jpg';
    $data = get_data($url);
    file_put_contents('photos/'.$name.'.jpg', $data);
}


来源:https://stackoverflow.com/questions/7747875/grab-download-images-from-multiple-pages-using-php-preg-match-all-curl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!