How can I use Perl to scrape a website that reveals its content with JavaScript?

為{幸葍}努か 提交于 2019-11-27 07:19:08

问题


I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows.

I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page.

Is Win32::IE::Mechanize up to date with the latest versions of IE?

But, more to the point, given a recent WinXP machine, what's the quickest, easiest way to scrape a site which only reveals its content via JavaScript?


回答1:


WWW::Selenium.

  • It allows you to specify which browser to use (IE and Firefox are supported from the get-go)
  • It supports access to elements via xpath elements, table IDs, text (regex-matching!) and URLs
  • It provides a Swiss army knife of user-interaction options, giving you flexibility over how you wish to simulate end-user browsing

You'll need to download the Selenium Remote Control and have it running in the background for the module to work.

It may not be a good option if your page load times are unpredictable.




回答2:


Have a look at Win32::Watir. It's a newer module and explicitly supports IE 6, 7 and 8.




回答3:


I don't see any mention of WWW::Mechanize, so I'll bring it up just for completeness. Selenium is also becoming very popular and can be used in a lot of testing scenarios.




回答4:


WWW::Scripter and its ::Plugin::Javascript can probably help you.



来源:https://stackoverflow.com/questions/2703902/how-can-i-use-perl-to-scrape-a-website-that-reveals-its-content-with-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!