HTML parsing in perl

∥☆過路亽.° 提交于 2019-12-28 02:04:56

问题


I'm trying to parse the following HTML structure with in perl. I need to select all of the dd elements that contain the class message and also an id. All I would like the script to do is loop through all of the dd elements and print out the id of the dd element but it needs to ignore the first dd element as that is static and will not change.

It can be with any perl module as long as it can be installed from cpan to make it easy for me. I don't have much experience with perl and parsing html so any pointers would be very helpful.

Thanks :)

HTML Structure:

<pre><code>
<html>
<head>
</head>
<body>
 .....other elements
    <div id="messages">
        <div class="header"></div>
        <dl>
            <dd class="message unread mc-friend mc-message">This is just a random message, do not parse</dd>
            <dd id="msg2" class="message unread mc-message">
                Hello
            </div>
            <dd id="msg3" class="message unread mc-message">
                Hello
            </dd>
        </dl>
    </div>
</body>
</html>
</pre></code>

回答1:


Something like this, quick and easy:

#! /usr/bin/perl
use strict;
use warnings;

use Mojo::DOM;

my $html = "Your HTML goes here";

my $dom = Mojo::DOM->new;
$dom->parse($html);
my $skip;
for my $dd ($dom->find('dd[class*="message"]')->each) {
    print $dd->attrs->{id}, "\n" if $skip++;
}



回答2:


Have a look at HTML::Parser or better yet HTML::TreeBuilder

More on TreeBuilder.



来源:https://stackoverflow.com/questions/4598162/html-parsing-in-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!