How can I screen scrape with Perl?

别来无恙 提交于 2019-12-17 22:25:20

问题


I need to display some values that are stored in a website, for that I need to scrape the website and fetch the content from the table. Any ideas?


回答1:


If you are familiar with jQuery you might want to check out pQuery, which makes this very easy:

## print every <h2> tag in page
use pQuery;

pQuery("http://google.com/search?q=pquery")
    ->find("h2")
    ->each(sub {
        my $i = shift;
        print $i + 1, ") ", pQuery($_)->text, "\n";
    });

There's also HTML::DOM.

Whatever you do, though, don't use regular expressions for this.




回答2:


I have used HTML Table Extract in the past. I personally find it a bit clumsy to use, but maybe I did not understand the object model well. I usually use this part of the manual to examine the data:

 use HTML::TableExtract;
 $te = HTML::TableExtract->new();
 $te->parse($html_string);

     # Examine all matching tables
     foreach $ts ($te->tables) {
       print "Table (", join(',', $ts->coords), "):\n";
       foreach $row ($ts->rows) {
          print join(',', @$row), "\n";
       }
     }`



回答3:


Although I've generally done this with LWP/LWP::Simple, the current 'preferred' module for any sort of webpage scraping in Perl is WWW::Mechanize.




回答4:


If you're familiar with XPath, you can also use HTML::TreeBuilder::XPath. And if you're not... well you should be ;--)




回答5:


You could also use this simple perl module WEB::Scraper, this is simple to understand and make life easy for me. follow this example for more information.

http://teusje.wordpress.com/2010/05/02/web-scraping-with-perl/




回答6:


For similar Stackoverflow questions have a look at....

  • How can I extract URLs from a web page in Perl
  • How can I extract XML of a website and save in a file using Perl’s LWP?

I do like using pQuery for things like this however Web::Scraper does look interesting.




回答7:


I don't mean to drag up a dead thread but anyone googling across this thread should also checkout WWW::Scripter - 'For scripting web sites that have scripts'

happy remote data aggregating ;)




回答8:


Take a look at the magical Web::Scraper, it's THE tool for web scraping.




回答9:


I use LWP::UserAgent for most of my screen scraping needs. You can also Couple that with HTTP::Cookies if you need Cookies support.

Here's a simple example on how to get source.

use LWP;
use HTTP::Cookies;
my $cookie_jar = HTTP::Cookies->new;
my $browser = LWP::UserAgent->new;
$browser->cookie_jar($cookie_jar);

$resp = $browser->get("https://www.stackoverflow.com");
if($resp->is_success) {
   # Play with your source here
   $source = $resp->content;
   $source =~ s/^.*<table>/<table>/i; # this is just an example 
   print $source;                     # not a solution to your problem.
}



回答10:


Check out this little example of web scraping with perl: link text



来源:https://stackoverflow.com/questions/713827/how-can-i-screen-scrape-with-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!