问题
I was wondering if anyone knew how to handle redirects with the Request npm from sites such as bitly or tribal or Twitter's t.co URLs. For example, if I have web page that I want to scrape with the Request npm and the link I have to get to that page is a bity or shortened URL that is going to redirect me, how do I handle those redirects?
I found that the Request npm has a "followRedirect" options set to true by default. If I set that to false I can get the next link that the page will redirect me to by scraping that page that is returned, but that isn't the best because I don't know how many redirects I am going to have to go through.
Right now I am getting a 500 error. When I have "followRedirect" set to true. When I have "followRedirect" set to false, I can get each redirect page. Again, I don't know how many redirect pages I will have to go through. Code is below:
var options = {
followRedirect: false
};
request('http://t.co/gJ74UfmH4i', options, function(err, response, body){
// when options are set I get the redirect page
// when options are not set I get a 500
});
回答1:
At first, you need to get the last redirect url, using followAllRedirects: true parameter
request('http://t.co/gJ74UfmH4i', {
method: 'HEAD',
followAllRedirects: true
}, function(err, response, body) {
var url = response.request.href
})
>
The second part is making request to final url, with some browser-like headers
request(url, {
headers: {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.46 Safari/537.36"
},
}, function(err, response, body) {
//here is your body
})
回答2:
The Request package follows HTTP 3xx redirects by default but the URL you are using is returning an HTTP 200 with a META REFRESH style of redirect. I'm not sure if Request supports this particular style of redirect so you may need to parse the response and follow it manually.
GET http://t.co/gJ74UfmH4i HTTP/1.1
HTTP/1.1 200 OK
cache-control: private,max-age=300
content-length: 208
content-type: text/html; charset=utf-8
date: Fri, 28 Aug 2015 16:28:59 GMT
expires: Fri, 28 Aug 2015 16:33:59 GMT
server: tsa_b
set-cookie: muc=b0a729d6-9a30-466c-9cd9-57306369613f; Expires=Wed, 09 Aug 2017 16:28:59 GMT; Domain=t.co
x-connection-hash: 28133ba91da8c83d45afa434e12f8a72
x-response-time: 9
x-xss-protection: 1; mode=block
<noscript><META http-equiv="refresh" content="0;URL=http://nyti.ms/1EmZJhP"></noscript><title>http://nyti.ms/1EmZJhP</title><script>window.opener = null; location.replace("http:\/\/nyti.ms\/1EmZJhP")</script>
One possible route to understanding the issue would be to use a function for followRedirect to see if you can find out where it's failing.
From the README:
followRedirect
- follow HTTP 3xx responses as redirects (default: true
). This property can also be implemented as function which gets response object as a single argument and should return true
if redirects should continue or false
otherwise.
来源:https://stackoverflow.com/questions/32275621/request-npm-handling-redirects