regex pattern in python for parsing HTML title tags

后端未结

关注

 4  1469

野性不改 2020-12-05 20:09

I am learning to use both the re module and the urllib module in python and attempting to write a simple web scraper. Here\'s the code I\'ve writte

4条回答

予麋鹿 (楼主)

2020-12-05 20:44

You could scrape a bunch of titles with a couple lines of gazpacho:

from gazpacho import Soup

urls = ["http://google.com", "https://facebook.com", "http://reddit.com"]

titles = []
for url in urls:
    soup = Soup.get(url)
    title = soup.find("title", mode="first").text
    titles.append(title)

This will output:

titles
['Google',
 'Facebook - Log In or Sign Up',
 'reddit: the front page of the internet']

0 讨论(0)

查看其它4个回答