Scrape title by only downloading relevant part of webpage

前端 未结 6 1790
深忆病人
深忆病人 2021-02-05 10:45

I would like to scrape just the title of a webpage using Python. I need to do this for thousands of sites so it has to be fast. I\'ve seen previous questions like retrieving jus

6条回答
  •  庸人自扰
    2021-02-05 11:24

    You're scraping webpages using standard REST requests and I'm not aware of any request that only returns the title, so I don't think it's possible.

    I know this doesn't necessarily help get the title only, but I usually use BeautifulSoup for any web scraping. It's much easier. Here's an example.

    Code:

    import requests
    from bs4 import BeautifulSoup
    
    urls = ["http://www.google.com", "http://www.msn.com"]
    
    for url in urls:
        r = requests.get(url)
        soup = BeautifulSoup(r.text, "html.parser")
    
        print "Title with tags: %s" % soup.title
        print "Title: %s" % soup.title.text
        print
    

    Output:

    Title with tags: Google
    Title: Google
    
    Title with tags: MSN.com - Hotmail, Outlook, Skype, Bing, Latest News, Photos & Videos
    Title: MSN.com - Hotmail, Outlook, Skype, Bing, Latest News, Photos & Videos
    

提交回复
热议问题