How to extract URLs from an HTML page in Python [closed]

后端未结

关注

 5  2129

情深已故 2020-12-24 04:15

5条回答

暖寄归人 (楼主)

2020-12-24 05:14

You can use BeautifulSoup as many have also stated. It can parse HTML,XML etc. To see some of it's features, see here.

Example:

import urllib2
from bs4 import BeautifulSoup
url = 'http://www.google.co.in/'

conn = urllib2.urlopen(url)
html = conn.read()

soup = BeautifulSoup(html)
links = soup.find_all('a')

for tag in links:
    link = tag.get('href',None)
    if link is not None:
        print link

0 讨论(0)

查看其它5个回答

热议问题