bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

匿名 (未验证) 提交于 2019-12-03 02:49:01

问题:

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? 

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePages And in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

Any help in figuring out what the problem is and how it can be solved would much be appreciated.

回答1:

I have a suspicion that this is related the the parser that BS will use to read the HTML. They document it here but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded by version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml 

And then try:

soup = BeautifulSoup(html, "lxml") 

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily



回答2:

For basic out of the box python with bs4 installed then you can process your xml with

soup = BeautifulSoup(html, "html5lib") 

If however you want to use formatter='xml' then you need to

pip3 install lxml  soup = BeautifulSoup(html, features="xml") 


回答3:

I preferred built in python html parser, no install no dependencies soup = BeautifulSoup(s, "html.parser")



回答4:

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

>>> import html5lib Traceback (most recent call last): File "", line 1, in    File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in      from .html5parser import HTMLParser, parse, parseFragment   File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in      from six import with_metaclass, viewkeys, PY3 ImportError: cannot import name viewkeys 

Upgrading your six package will solve the issue:

sudo pip install six=1.10.0 


回答5:

I am using Python 3.6 and I had the same original error in this post. After I ran the command:

python3 -m pip install lxml 

it resolved my problem



回答6:

Parser library is not install on your machine or not found.

Try this command from cmd:

pip install lxml



回答7:

Instead of using lxml use html.parser, you can use this piece of code:

soup = BeautifulSoup(html, 'html.parser') 


回答8:

I resolved this error by upgrading my lxml distribution:

pip install -U lxml



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!