Using Beautiful Soup to get the full URL in source code

后端 未结 2 1543
误落风尘
误落风尘 2020-12-10 14:51

So I was looking at some source code and I came across this bit of code



        
2条回答
  •  悲&欢浪女
    2020-12-10 15:25

    from bs4 import BeautifulSoup
    import requests
    import lxml
    
    r = requests.get("http://example.com")
    
    url = r.url  # this is base url
    data = r.content  # this is content of page
    soup = BeautifulSoup(data, 'lxml')
    temp_url = soup.find('a')['href']  # you need to modify this selector
    
    if temp_url[0:7] == "http://" or temp_url[0:8] == "https://" :  # if url have http://
            url = temp_url
    else:
            url = url + temp_url
    
    
    print url  # this is your full url
    

提交回复
热议问题