问题
Using: Delphi 2010, latest version of Indy
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
However, my Delphi application is only generating a couple of requests before it stops.
Here are the steps I have followed:
- Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
- Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
- Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'
Finally I have this code:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found Location: https://encrypted.google.com/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly Date: Tue, 28 Dec 2010 21:29:43 GMT Server: gws Content-Length: 226 X-XSS-Protection: 1; mode=block
Firstly, is there anything wrong with this output?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?
回答1:
IdHTTP component property values prior to making the call:
Name := 'IdHTTP1';
IOHandler := IdSSLIOHandlerSocketOpenSSL1;
AllowCookies := True;
HandleRedirects := True;
RedirectMaximum := 35;
Request.UserAgent :=
'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
'0b8';
HTTPOptions := [hoForceEncodeParams];
OnRedirect := IdHTTP1Redirect;
CookieManager := IdCookieManager1;
Redirect event handler:
procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
Handled := True;
end;
Making the call:
FURL := 'https://www.google.com';
GetUrlToFile( (FURL + '/adsense/'), 'a.html');
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
try
IdHTTP1.Get(AURL, Output);
IdHTTP1.Disconnect;
except
end;
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
Here's the (request and response headers) output from Fiddler:

回答2:
Getting redirects going
TIdHTTP.HandleRedirects := True
so it starts automatically handling redirects.
TIdHTTP.RedirectMaximum
is used to set how many successive redirects should be handled.
Alternatively you may assign TIdHTTP.OnRedirect
and set Handled := True
from that handler. This is what I'm doing in a project that needs to read data from a WikiMedia web site (my own site).
About the HTTP response
Nothing wrong with that response, it's a very basic redirect to https://encrypted.google.com/. TIdHTTP should go to the given page in response. It also sets some cookies.
Other suggestions
Don't forget to assign an CookieManager
and make sure you use the same CookieManager
for all subsequent requests. If you don't you'll probably get redirected to the login page over and over again.
回答3:
In my case I needed to fix dest, because somehow I had ; in it!
procedure Tfrm1.IdHTTP1Redirect(Sender: TObject; var dest: string;
var NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
var
i: Integer;
begin
i := Pos(';', dest);
if i > 0 then
begin
dest := Copy(dest,1, i - 1);
end;
Handled := True;
end;
来源:https://stackoverflow.com/questions/4549809/indy-idhttp-how-to-handle-page-redirects