Python fetching <title>

混江龙づ霸主 提交于 2019-12-06 03:57:22

问题


I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).

Is there a good parsing lib for this purpose?


回答1:


Yes I would recommend BeautifulSoup

If you're getting the title it's simply:

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

or

myTitle = soup('title')

Taken from the documentation

It's very robust and will parse the html no matter how messy it is.




回答2:


Try Beautiful Soup:

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents



回答3:


Why are you guys importing a whole extra library for one task. No regular expressions? wasn't the request for urllib not bs4 or mech which are third party? to do with standard libraries parse the html and match the string then split the '>' '<' with re or whateves.

N=(len(html))
for a in html(N):
    if '<title>' in a:
        Title=(str(a))

thats python 2 I think, you can strip it




回答4:


Use Beautiful Soup.

html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string


来源:https://stackoverflow.com/questions/1660302/python-fetching-title

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!