bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

白昼怎懂夜的黑 提交于 2019-11-27 02:50:02
James Errico

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

And then try:

soup = BeautifulSoup(html, "lxml")

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.

For basic out of the box python with bs4 installed then you can process your xml with

soup = BeautifulSoup(html, "html5lib")

If however you want to use formatter='xml' then you need to

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

I preferred built in python html parser, no install no dependencies soup = BeautifulSoup(s, "html.parser")

Bashar

I am using Python 3.6 and I had the same original error in this post. After I ran the command:

python3 -m pip install lxml

it resolved my problem

Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

But if you don't specified any parser as parameter you will get an warning that no parser specified.

soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser you need to install it and then need to specify it. like

pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

External parser have c and python dependency which may have some advantage and disadvantage.

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

Upgrading your six package will solve the issue:

sudo pip install six=1.10.0

Instead of using lxml use html.parser, you can use this piece of code:

soup = BeautifulSoup(html, 'html.parser')
abhishekPakrashi

In some references, use the second instead of the first:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!