Emulation of lex like functionality in Perl or Python

前端 未结 8 2109
梦毁少年i
梦毁少年i 2021-01-13 23:46

Here\'s the deal. Is there a way to have strings tokenized in a line based on multiple regexes?

One example:

I have to get all href tags, their corresponding

8条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-14 00:30

    If you're specifically after parsing links out of web-pages, then Perl's WWW::Mechanize module will figure things out for you in a very elegant fashion. Here's a sample program that grabs the first page of Stack Overflow and parses out all the links, printing their text and corresponding URLs:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use WWW::Mechanize;
    
    my $mech = WWW::Mechanize->new;
    
    $mech->get("http://stackoverflow.com/");
    
    $mech->success or die "Oh no! Couldn't fetch stackoverflow.com";
    
    foreach my $link ($mech->links) {
        print "* [",$link->text, "] points to ", $link->url, "\n";
    }
    

    In the main loop, each $link is a WWW::Mechanize::Link object, so you're not just constrained to getting the text and URL.

    All the best,

    Paul

提交回复
热议问题