bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml [duplicate]

☆樱花仙子☆ 提交于 2019-12-12 03:42:32

问题


Can you please suggest a fix? It almost download all the images from imgur pages with one single image not sure why it is not working in this case and how to fix it?

elif 'imgur.com' in submission.url and not (submission.url.endswith('gif')
                        or submission.url.endswith('webm')
                        or submission.url.endswith('mp4')
                        or 'all' in submission.url
                        or '#' in submission.url
                        or '/a/' in submission.url):
                html_source = requests.get(submission.url).text # download the image's page
                soup = BeautifulSoup(html_source, "lxml")
                image_url = soup.select('img')[0]['src']
                if image_url.startswith('//'):
                image_url = 'http:' + image_url
                image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')]
                try:
                image_file = urllib2.urlopen(image_url, timeout = 5)
                with open('/home/mona/computer_vision/image_retrieval/images/'+ category+ '/'+ 'imgur_'+ datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-9:], 'wb') as output_image:
                        output_image.write(image_file.read())
                        except urllib2.URLError as e:
                        print(e)
                        continue

The error is:

[LOG] Done Getting http://i.imgur.com/FoCjtI7.jpg
submission id is: 1alffm
[LOG] Getting url:  http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg
HTTP Error 403: Forbidden
[LOG] Getting url:  http://imgur.com/xp386
Traceback (most recent call last):
  File "download_images.py", line 67, in <module>
    soup = BeautifulSoup(html_source, "lxml")
  File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 155, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

回答1:


Open a python shell and try the following:

from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")

Does that work, or same error? If same error, you're missing lxml. Install it:

pip install lxml

I'm going through the steps because you indicate that the script works for a good while before crashing, in which case, you can't be missing the parser?

Added by OP:

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml 

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml


来源:https://stackoverflow.com/questions/39986835/bs4-featurenotfound-couldnt-find-a-tree-builder-with-the-features-you-requeste

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!