问题
Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it?
elif 'imgur.com' in submission.url and not (submission.url.endswith('gif')
or submission.url.endswith('webm')
or submission.url.endswith('mp4')
or 'all' in submission.url
or '#' in submission.url
or '/a/' in submission.url):
html_source = requests.get(submission.url).text # download the image's page
soup = BeautifulSoup(html_source, "lxml")
image_url = soup.select('img')[0]['src']
if image_url.startswith('//'):
image_url = 'http:' + image_url
image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')]
try:
image_file = urllib2.urlopen(image_url, timeout = 5)
with open('/home/mona/computer_vision/image_retrieval/images/'+ category+ '/'+ 'imgur_'+ datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-9:], 'wb') as output_image:
output_image.write(image_file.read())
except urllib2.URLError as e:
print(e)
continue
The error is:
[LOG] Done Getting http://i.imgur.com/FoCjtI7.jpg
submission id is: 1alffm
[LOG] Getting url: http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg
HTTP Error 403: Forbidden
[LOG] Getting url: http://imgur.com/xp386
Traceback (most recent call last):
File "download_images.py", line 67, in <module>
soup = BeautifulSoup(html_source, "lxml")
File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 155, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
回答1:
Open a python shell and try the following:
from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")
Does that work, or same error? If same error, you're missing lxml. Install it:
pip install lxml
I'm going through the steps because you indicate that the script works for a good while before crashing, in which case, you can't be missing the parser?
Added by OP:
If you are using Python2.7 in Ubuntu/Debian, this worked for me:
$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml
Test it like:
mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
来源:https://stackoverflow.com/questions/39986835/bs4-featurenotfound-couldnt-find-a-tree-builder-with-the-features-you-requeste