Anyway to scrape a link that redirects?

≯℡__Kan透↙ 提交于 2021-02-07 09:47:58

问题


Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located.


回答1:


There are 3 types of redirections

  • HTTP - as information in response headers (with code 301, 302, 3xx)
  • HTML - as tag <meta> in HTML (wikipedia: Meta refresh)
  • JavaScript - as code like window.location = new_url

requests execute HTTP redirections and keep all urls in r.history

import requests

r = requests.get('http://' + 'bit.ly/english-4-it')

print(r.history)
print(r.url)

result:

[<Response [301]>, <Response [301]>]
http://helion.pl/ksiazki/english-4-it-praktyczny-kurs-jezyka-angielskiego-dla-specjalistow-it-i-nie-tylko-beata-blaszczyk,anginf.htm

BTW: SO doesn't let put bitly link in text so I used concatenation.



来源:https://stackoverflow.com/questions/41310219/anyway-to-scrape-a-link-that-redirects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!