How to get rid of BeautifulSoup user warning?

ぐ巨炮叔叔 提交于 2019-11-26 05:30:23

问题


After I installed BeautifulSoup, Whenever I run my Python in cmd, this warning comes out.

D:\\Application\\python\\lib\\site-packages\\beautifulsoup4-4.4.1-py3.4.egg\\bs4\\__init__.py:166:
UserWarning: No parser was explicitly specified, so I\'m using the best
available HTML parser for this system (\"html.parser\"). This usually isn\'t a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], \"html.parser\")

I have no ideal why it comes out and how to solve it.


回答1:


The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.

BeautifulSoup( ... )

In order to fix the error, you'll need to specify which parser you'd like to use, like so:

BeautifulSoup( ..., "html.parser" )

You can also install a 3rd party parser if you'd like.




回答2:


Documentation recommends that you install and use lxml for speed.

BeautifulSoup(html, "lxml")

If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.

Installing LXML parser

  • On Ubuntu (debian)

    apt-get install python-lxml 
    
  • Fedora (RHEL based)

    dnf install python-lxml
    
  • Using PIP

    pip install lxml
    



回答3:


For HTML parser, you need to install html5lib, run:

pip install html5lib

then add html5lib in the BeautifulSoup method:

htmlDoc = bs4.BeautifulSoup(req1.text, 'html5lib')
print(htmlDoc)


来源:https://stackoverflow.com/questions/33511544/how-to-get-rid-of-beautifulsoup-user-warning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!