How can I use Perl to scrape a website that reveals its content with JavaScript?

问题

I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows.

I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page.

Is Win32::IE::Mechanize up to date with the latest versions of IE?

But, more to the point, given a recent WinXP machine, what's the quickest, easiest way to scrape a site which only reveals its content via JavaScript?

回答1:

WWW::Selenium.

It allows you to specify which browser to use (IE and Firefox are supported from the get-go)
It supports access to elements via xpath elements, table IDs, text (regex-matching!) and URLs
It provides a Swiss army knife of user-interaction options, giving you flexibility over how you wish to simulate end-user browsing

You'll need to download the Selenium Remote Control and have it running in the background for the module to work.

It may not be a good option if your page load times are unpredictable.

回答2:

Have a look at Win32::Watir. It's a newer module and explicitly supports IE 6, 7 and 8.

回答3:

I don't see any mention of WWW::Mechanize, so I'll bring it up just for completeness. Selenium is also becoming very popular and can be used in a lot of testing scenarios.

回答4:

WWW::Scripter and its ::Plugin::Javascript can probably help you.

来源：https://stackoverflow.com/questions/2703902/how-can-i-use-perl-to-scrape-a-website-that-reveals-its-content-with-javascript

标签

javascript

windows

perl

internet-explorer

mechanize