How to extract and download all images from a website using beautifulSoup?

前端 未结 2 1748
南笙
南笙 2020-11-27 18:43

I am trying to extract and download all images from a url. I wrote a script

import urllib2
import re
from os.path import basename
from urlparse import urlsp         


        
2条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-27 19:21

    If you want only pictures then you can just download them without even scrapping the webpage. The all have the same URL:

    http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute1.jpg
    http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute2.jpg
    ...
    http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-cutest-pics-gallery/cute10.jpg
    

    So simple code as that will give you all images:

    import os
    import urllib
    import urllib2
    
    
    baseUrl = "http://filmygyan.in/wp-content/gallery/katrina-kaifs-top-10-"\
          "cutest-pics-gallery/cute%s.jpg"
    
    for i in range(1,11):
        url = baseUrl % i
        urllib.urlretrieve(url, os.path.basename(url))
    

    With Beautifulsoup you will have to click or go to the next page to scrap the images. If you want ot scrap each page individually try to scrathem using there class which is shutterset_katrina-kaifs-top-10-cutest-pics-gallery

提交回复
热议问题