How can I get href links from HTML using Python?

后端未结

关注

 10  2319

自闭症患者 2020-11-27 03:25

import urllib2

website = \"WEBSITE\"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

So far so good.

But I wa

10条回答

南方客 (楼主)

2020-11-27 03:46

Simplest way for me:

from urlextract import URLExtract
from requests import get

url = "sample.com/samplepage/"
req = requests.get(url)
text = req.text
# or if you already have the html source:
# text = "This is html for ex Google Yahoo"
text = text.replace(' ', '').replace('=','')
extractor = URLExtract()
print(extractor.find_urls(text))

output:

['http://google.com/', 'http://yahoo.com/']

0 讨论(0)

查看其它10个回答