Extract all links from a web page using python

前端 未结 3 565
死守一世寂寞
死守一世寂寞 2020-12-28 11:03

Following Introduction to Computer Science track at Udacity, I\'m trying to make a python script to extract links from page, below is the code I used:

I got the fol

3条回答
  •  醉酒成梦
    2020-12-28 12:00

    You can find all instances of tags that have an attribute containing http in htmlpage. This can be achieved using find_all method from BeautifulSoup and passing attrs={'href': re.compile("http")}

    import re
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(htmlpage, 'html.parser')
    links = []
    for link in soup.find_all(attrs={'href': re.compile("http")}):
        links.append(link.get('href'))
    
    print(links)
    

提交回复
热议问题