How to extract and download all images from a website using beautifulSoup?

前端 未结 2 1755
南笙
南笙 2020-11-27 18:43

I am trying to extract and download all images from a url. I wrote a script

import urllib2
import re
from os.path import basename
from urlparse import urlsp         


        
2条回答
  •  暖寄归人
    2020-11-27 19:08

    The following should extract all images from a given page and write it to the directory where the script is being run.

    import re
    import requests
    from bs4 import BeautifulSoup
    
    site = 'http://pixabay.com'
    
    response = requests.get(site)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    img_tags = soup.find_all('img')
    
    urls = [img['src'] for img in img_tags]
    
    
    for url in urls:
        filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
        if not filename:
             print("Regex didn't match with the url: {}".format(url))
             continue
        with open(filename.group(1), 'wb') as f:
            if 'http' not in url:
                # sometimes an image source can be relative 
                # if it is provide the base url which also happens 
                # to be the site variable atm. 
                url = '{}{}'.format(site, url)
            response = requests.get(url)
            f.write(response.content)
    

提交回复
热议问题