Scrape title by only downloading relevant part of webpage

前端未结

关注

 6  1825

深忆病人 2021-02-05 10:45

I would like to scrape just the title of a webpage using Python. I need to do this for thousands of sites so it has to be fast. I\'ve seen previous questions like retrieving jus

6条回答

庸人自扰 (楼主)

2021-02-05 11:24

You're scraping webpages using standard REST requests and I'm not aware of any request that only returns the title, so I don't think it's possible.

I know this doesn't necessarily help get the title only, but I usually use BeautifulSoup for any web scraping. It's much easier. Here's an example.

Code:

import requests
from bs4 import BeautifulSoup

urls = ["http://www.google.com", "http://www.msn.com"]

for url in urls:
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    print "Title with tags: %s" % soup.title
    print "Title: %s" % soup.title.text
    print

Output:

Title with tags: Google
Title: Google

Title with tags: MSN.com - Hotmail, Outlook, Skype, Bing, Latest News, Photos & Videos
Title: MSN.com - Hotmail, Outlook, Skype, Bing, Latest News, Photos & Videos

0 讨论(0)

查看其它6个回答