How can I remove unused, nested HTML span tags with a Perl regex?

后端未结

关注

 4  1144

南旧 2021-01-06 10:23

I\'m trying to remove unused spans (i.e. those with no attribute) from HTML files, having already cleaned up all the attributes I didn\'t want with other regular expressions

4条回答

不知归路 (楼主)

2021-01-06 11:01

Try HTML::Parser:

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser;

my @print_span;
my $p = HTML::Parser->new(
  start_h   => [ sub {
    my ($text, $name, $attr) = @_;
    if ( $name eq 'span' ) {
      my $print_tag = %$attr;
      push @print_span, $print_tag;
      return if !$print_tag;
    }
    print $text;
  }, 'text,tagname,attr'],
  end_h => [ sub {
    my ($text, $name) = @_;
    if ( $name eq 'span' ) {
      return if !pop @print_span;
    }
    print $text;
  }, 'text,tagname'],
  default_h => [ sub { print shift }, 'text'],
);
$p->parse_file(\*DATA) or die "Err: $!";
$p->eof;

__END__


This is a title


This is a header
a b c de

0 讨论(0)

查看其它4个回答