Python HTML parsing

问题

I am currently trying to make a program that given a word will look up its definition and return it. Although I have gotten this to work, I had to resort to using RegEx to search for the text between the tags where the definitions are stored. What is a more efficient way to do this using python 3.x?

回答1:

lxml works for Python 3. It has an ElementTree compatible API, but is using c libraries behind the scenes, so it's fast, and it supports Xpaths, which is a nice way of parsing (sometimes).

回答2:

Try BeautifulSoup a good HTML parser for Python. (works with Python 3.x too, although unless you are deep into a Python 3.0 project, consider using 2.7)

回答3:

Your's a pretty simple requirement when it comes to HTML parsing. Python standard library includes ElementTree module which should be helpful to do the task which you are planning to undertake. Look for the example snippet which is given in that page.

Also, never make the mistake of parsing HTML/XML using regex. You may not know when it will get insanely complicated and it is a bad idea under any situation too.

来源：https://stackoverflow.com/questions/4895102/python-html-parsing

标签

python

html

python-3.x

html-parsing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!