regex pattern in python for parsing HTML title tags

后端 未结 4 1469
野性不改
野性不改 2020-12-05 20:09

I am learning to use both the re module and the urllib module in python and attempting to write a simple web scraper. Here\'s the code I\'ve writte

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-05 20:44

    You could scrape a bunch of titles with a couple lines of gazpacho:

    from gazpacho import Soup
    
    urls = ["http://google.com", "https://facebook.com", "http://reddit.com"]
    
    titles = []
    for url in urls:
        soup = Soup.get(url)
        title = soup.find("title", mode="first").text
        titles.append(title)
    

    This will output:

    titles
    ['Google',
     'Facebook - Log In or Sign Up',
     'reddit: the front page of the internet']
    

提交回复
热议问题