Request npm: Handling Redirects

风流意气都作罢 提交于 2021-02-04 07:57:59

问题


I was wondering if anyone knew how to handle redirects with the Request npm from sites such as bitly or tribal or Twitter's t.co URLs. For example, if I have web page that I want to scrape with the Request npm and the link I have to get to that page is a bity or shortened URL that is going to redirect me, how do I handle those redirects?

I found that the Request npm has a "followRedirect" options set to true by default. If I set that to false I can get the next link that the page will redirect me to by scraping that page that is returned, but that isn't the best because I don't know how many redirects I am going to have to go through.

Right now I am getting a 500 error. When I have "followRedirect" set to true. When I have "followRedirect" set to false, I can get each redirect page. Again, I don't know how many redirect pages I will have to go through. Code is below:

var options = {
  followRedirect: false
};

request('http://t.co/gJ74UfmH4i', options, function(err, response, body){
     // when options are set I get the redirect page
     // when options are not set I get a 500
});

回答1:


At first, you need to get the last redirect url, using followAllRedirects: true parameter

request('http://t.co/gJ74UfmH4i', {
  method: 'HEAD',
  followAllRedirects: true
}, function(err, response, body) {
  var url = response.request.href
}) 

>

The second part is making request to final url, with some browser-like headers

request(url, {
  headers: {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.46 Safari/537.36"
  },
}, function(err, response, body) { 
  //here is your body 
})



回答2:


The Request package follows HTTP 3xx redirects by default but the URL you are using is returning an HTTP 200 with a META REFRESH style of redirect. I'm not sure if Request supports this particular style of redirect so you may need to parse the response and follow it manually.

GET http://t.co/gJ74UfmH4i HTTP/1.1

HTTP/1.1 200 OK
cache-control: private,max-age=300
content-length: 208
content-type: text/html; charset=utf-8
date: Fri, 28 Aug 2015 16:28:59 GMT
expires: Fri, 28 Aug 2015 16:33:59 GMT
server: tsa_b
set-cookie: muc=b0a729d6-9a30-466c-9cd9-57306369613f; Expires=Wed, 09 Aug 2017 16:28:59 GMT; Domain=t.co
x-connection-hash: 28133ba91da8c83d45afa434e12f8a72
x-response-time: 9
x-xss-protection: 1; mode=block

<noscript><META http-equiv="refresh" content="0;URL=http://nyti.ms/1EmZJhP"></noscript><title>http://nyti.ms/1EmZJhP</title><script>window.opener = null; location.replace("http:\/\/nyti.ms\/1EmZJhP")</script>

One possible route to understanding the issue would be to use a function for followRedirect to see if you can find out where it's failing.

From the README:

followRedirect - follow HTTP 3xx responses as redirects (default: true). This property can also be implemented as function which gets response object as a single argument and should return true if redirects should continue or false otherwise.



来源:https://stackoverflow.com/questions/32275621/request-npm-handling-redirects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!