Extract all links from a web page using python

前端未结

关注

 3  565

死守一世寂寞 2020-12-28 11:03

Following Introduction to Computer Science track at Udacity, I\'m trying to make a python script to extract links from page, below is the code I used:

I got the fol

3条回答

醉酒成梦 (楼主)

2020-12-28 12:00
You can find all instances of tags that have an attribute containing http in htmlpage. This can be achieved using find_all method from BeautifulSoup and passing attrs={'href': re.compile("http")}
```
import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlpage, 'html.parser')
links = []
for link in soup.find_all(attrs={'href': re.compile("http")}):
    links.append(link.get('href'))

print(links)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...