Perl Mechanize : Get the response page after the page is modified?

时光怂恿深爱的人放手 提交于 2020-04-11 12:14:07

问题


I am trying to retrieve a page which uses js and database to load. The loading takes about 2 to 3 mins. I am able to get the page where it would show "Please wait 2 to 3 mins for the page to be loaded." But not able to retrieve the page after it is loaded.

I have already tried the following:

1.) Using mirror method in the Mechanize. But the response content is not decoded. Hence the file is gibberish. (Also tried to write a similar method as mirror method which would decode the response content but that also doesnt work. The New content is not loaded.)

2.) Tried to add a request header 'if-modified-since'. But still the time is same and the new content is not fetched.

Any pointers or suggestions would really be helpful.

TIA :)


回答1:


It wont work with Mechanize itself, you need to check first what javascript is doing to the page, and from where the data are coming from. Then, 2 possibilities :

  • You mimic the javascript in perl after you get the data before load, and from where javascript is downloading the new data. See if the data are somewhat encoded, and decode it with perl.
  • You use Mech Firefox, then you do not need to care about javascript as it will be handled by Firefox. You can hide the application if you do not want to see it.

Example :

use WWW::Mechanize::Firefox;
use HTML::TreeBuilder::LibXML;
my $mech = WWW::Mechanize::Firefox->new;
$mech->get('http://example.com/ajax.html');
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($mech->content);
$tree->eof;
my $something = $tree->findvalue('/html/body/div[10]/table');

Above code is not tested, but should work.

Enjoy.



来源:https://stackoverflow.com/questions/25129159/perl-mechanize-get-the-response-page-after-the-page-is-modified

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!