问题
I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows.
I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page.
Is Win32::IE::Mechanize up to date with the latest versions of IE?
But, more to the point, given a recent WinXP machine, what's the quickest, easiest way to scrape a site which only reveals its content via JavaScript?
回答1:
WWW::Selenium.
- It allows you to specify which browser to use (IE and Firefox are supported from the get-go)
- It supports access to elements via xpath elements, table IDs, text (regex-matching!) and URLs
- It provides a Swiss army knife of user-interaction options, giving you flexibility over how you wish to simulate end-user browsing
You'll need to download the Selenium Remote Control and have it running in the background for the module to work.
It may not be a good option if your page load times are unpredictable.
回答2:
Have a look at Win32::Watir. It's a newer module and explicitly supports IE 6, 7 and 8.
回答3:
I don't see any mention of WWW::Mechanize, so I'll bring it up just for completeness. Selenium is also becoming very popular and can be used in a lot of testing scenarios.
回答4:
WWW::Scripter and its ::Plugin::Javascript can probably help you.
来源:https://stackoverflow.com/questions/2703902/how-can-i-use-perl-to-scrape-a-website-that-reveals-its-content-with-javascript