Python fetching | 易学教程

Python fetching <title>

问题

I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).

Is there a good parsing lib for this purpose?

回答1:

Yes I would recommend BeautifulSoup

If you're getting the title it's simply:

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

myTitle = soup('title')

Taken from the documentation

It's very robust and will parse the html no matter how messy it is.

回答2:

Try Beautiful Soup:

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents

回答3:

Why are you guys importing a whole extra library for one task. No regular expressions? wasn't the request for urllib not bs4 or mech which are third party? to do with standard libraries parse the html and match the string then split the '>' '<' with re or whateves.

N=(len(html))
for a in html(N):
    if '<title>' in a:
        Title=(str(a))

thats python 2 I think, you can strip it

回答4:

Use Beautiful Soup.

html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string

来源：https://stackoverflow.com/questions/1660302/python-fetching-title

标签

python

urllib2

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!