Loop through webpages and download all images

与世无争的帅哥 提交于 2021-01-29 11:12:25

问题


I have a nice URL structure to loop through:

https://marco.ccr.buffalo.edu/images?page=0&score=Clear
https://marco.ccr.buffalo.edu/images?page=1&score=Clear
https://marco.ccr.buffalo.edu/images?page=2&score=Clear
...

I want to loop through each of these pages and download the 21 images (JPEG or PNG). I've seen several Beautiful Soap examples, but Im still struggling to get something that will download multiple images and loop through the URLs. I think I can use urllib to loop through each URL like this, but Im not sure where the image saving comes in. Any help would be appreciated and thanks in advance!

for i in range(0,10):
    urllib.urlretrieve('https://marco.ccr.buffalo.edu/images?page=' + str(i) + '&score=Clear')

I was trying to follow this post but I was unsuccessful: How to extract and download all images from a website using beautifulSoup?


回答1:


You can use requests:

from bs4 import BeautifulSoup as soup
import requests, contextlib, re, os

@contextlib.contextmanager
def get_images(url:str):
  d = soup(requests.get(url).text, 'html.parser') 
  yield [[i.find('img')['src'], re.findall('(?<=\.)\w+$', i.find('img')['alt'])[0]] for i in d.find_all('a') if re.findall('/image/\d+', i['href'])]

n = 3 #end value
os.system('mkdir MARCO_images') #added for automation purposes, folder can be named anything, as long as the proper name is used when saving below
for i in range(n):
  with get_images(f'https://marco.ccr.buffalo.edu/images?page={i}&score=Clear') as links:
    print(links)
    for c, [link, ext] in enumerate(links, 1):
       with open(f'MARCO_images/MARCO_img_{i}{c}.{ext}', 'wb') as f:
           f.write(requests.get(f'https://marco.ccr.buffalo.edu{link}').content)

Now, inspecting the contents of the MARCO_images directory yields:

print(os.listdir('/Users/ajax/MARCO_images'))

Output:

['MARCO_img_1.jpg', 'MARCO_img_10.jpg', 'MARCO_img_11.jpg', 'MARCO_img_12.jpg', 'MARCO_img_13.jpg', 'MARCO_img_14.jpg', 'MARCO_img_15.jpg', 'MARCO_img_16.jpg', 'MARCO_img_17.jpg', 'MARCO_img_18.jpg', 'MARCO_img_19.jpg', 'MARCO_img_2.jpg', 'MARCO_img_20.jpg', 'MARCO_img_21.jpg', 'MARCO_img_3.jpg', 'MARCO_img_4.jpg', 'MARCO_img_5.jpg', 'MARCO_img_6.jpg', 'MARCO_img_7.jpg', 'MARCO_img_8.jpg', 'MARCO_img_9.jpg']


来源:https://stackoverflow.com/questions/51599798/loop-through-webpages-and-download-all-images

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!