How can I get href links from HTML using Python?

后端 未结 10 2315
自闭症患者
自闭症患者 2020-11-27 03:25
import urllib2

website = \"WEBSITE\"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

So far so good.

But I wa

10条回答
  •  忘掉有多难
    2020-11-27 03:53

    Look at using the beautiful soup html parsing library.

    http://www.crummy.com/software/BeautifulSoup/

    You will do something like this:

    import BeautifulSoup
    soup = BeautifulSoup.BeautifulSoup(html)
    for link in soup.findAll("a"):
        print link.get("href")
    

提交回复
热议问题