Downloading Images with Beautifulsoup without HTML 'img' tag

好久不见. 提交于 2019-12-23 03:22:23

问题


Im using beautifulsoup to find and download images from a given website, however the website contains images which aren't in the usual <img src="icon.gif"/> format:

The ones that are causing me problems for example are like this :

<form action="example.jpg">

<!-- <img src="big.jpg" /> -->

background-image:url("xine.png");

My code to find the images is:

webpage = "https://example.com/images/"
soup = BeautifulSoup(urlopen(webpage), "html.parser")

for img in soup.find_all('img'):
    img_url = urljoin(webpage, img['src'])
    file_name = img['src'].split('/')[-1]
    file_path = os.path.join("C:\\users\\images", file_name)
    urlretrieve(img_url, file_path)

I think i might have to use a regex but hopefully i don't have to.

Thanks in advance


回答1:


Modify the path you pass to urlretrieve to specify exactly where you want the file to be copied to:

file_path = os.path.join('c:\files\cw\downloads', file_name)
urlretrieve(img_url, file_path)

Edit: It looks like you are also trying to find img tags inside comments. Building off of Find specific comments in HTML code using python:

...
imgs = soup.find_all('img')
comments = soup.findAll(text=lambda text:isinstance(text, bs4.Comment))
for comment in comments:
    comment_soup = bs4.BeautifulSoup(comment)
    imgs.extend(comment_soup.findAll('img'))

for img in imgs:
    ...


来源:https://stackoverflow.com/questions/47541274/downloading-images-with-beautifulsoup-without-html-img-tag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!