Anyway to scrape a link that redirects?

问题

Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located.

回答1:

There are 3 types of redirections

HTTP - as information in response headers (with code 301, 302, 3xx)
HTML - as tag <meta> in HTML (wikipedia: Meta refresh)
JavaScript - as code like window.location = new_url

requests execute HTTP redirections and keep all urls in r.history

import requests

r = requests.get('http://' + 'bit.ly/english-4-it')

print(r.history)
print(r.url)

result:

[<Response [301]>, <Response [301]>]
http://helion.pl/ksiazki/english-4-it-praktyczny-kurs-jezyka-angielskiego-dla-specjalistow-it-i-nie-tylko-beata-blaszczyk,anginf.htm

BTW: SO doesn't let put bitly link in text so I used concatenation.

来源：https://stackoverflow.com/questions/41310219/anyway-to-scrape-a-link-that-redirects

标签

python

parsing

web-scraping

beautifulsoup

lxml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!