How can I get href links from HTML using Python?

后端 未结 10 2287
自闭症患者
自闭症患者 2020-11-27 03:25
import urllib2

website = \"WEBSITE\"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

So far so good.

But I wa

10条回答
  •  南方客
    南方客 (楼主)
    2020-11-27 03:46

    Simplest way for me:

    from urlextract import URLExtract
    from requests import get
    
    url = "sample.com/samplepage/"
    req = requests.get(url)
    text = req.text
    # or if you already have the html source:
    # text = "This is html for ex Google Yahoo"
    text = text.replace(' ', '').replace('=','')
    extractor = URLExtract()
    print(extractor.find_urls(text))
    
    

    output:

    ['http://google.com/', 'http://yahoo.com/']

提交回复
热议问题