requests.history not showing all redirects

夙愿已清 提交于 2019-12-04 06:32:16

问题


I'm trying to get the redirects of some Wikipedia pages, and it's happening something curious to me.

If i make:

>>> request = requests.get("https://en.wikipedia.org/wiki/barcelona", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Barcelona'
>>> request.history
[<Response [301]>]

As you can see, the redirection is correct and I have same url in browser that in Python.

But if I try:

>>> request = requests.get("https://en.wikipedia.org/wiki/Yardymli_Rayon", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Yardymli_Rayon'
>>> request.history
[]

And in the browser I see that the URL has changed to: https://en.wikipedia.org/wiki/Yardymli_District

Anyone knows how to solve it?


回答1:


Requests doesn't show the redirect because you're not actually being redirected in the HTTP sense. Wikipedia does some JavaScript trickery (probably HTML5 history modification and pushState) to change the address that's shown in the address bar, but that doesn't apply to Requests, of course.

In other words, both requests and your browser are correct: requests is showing the URL you actually requested (and Wikipedia actually served), while your browser's address bar is showing the 'proper', canonical URL.

You could parse the response and look for the <link rel="canonical"> tag if you want to find out the 'proper' URL from your script, or fetch articles over Wikipedia's API instead.



来源:https://stackoverflow.com/questions/31939197/requests-history-not-showing-all-redirects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!