问题
Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located.
回答1:
There are 3 types of redirections
HTTP
- as information in response headers (with code 301, 302, 3xx)HTML
- as tag<meta>
in HTML (wikipedia: Meta refresh)JavaScript
- as code likewindow.location = new_url
requests
execute HTTP
redirections and keep all urls in r.history
import requests
r = requests.get('http://' + 'bit.ly/english-4-it')
print(r.history)
print(r.url)
result:
[<Response [301]>, <Response [301]>]
http://helion.pl/ksiazki/english-4-it-praktyczny-kurs-jezyka-angielskiego-dla-specjalistow-it-i-nie-tylko-beata-blaszczyk,anginf.htm
BTW: SO doesn't let put bitly link in text so I used concatenation.
来源:https://stackoverflow.com/questions/41310219/anyway-to-scrape-a-link-that-redirects