Whats the easiest way to strip the HTML tags in perl. I am using a regular expression to parse HTML from a URL which works great but how can I strip the HTML tags off?
Have a look at the HTML::Restrict module which allows you to strip away or restrict the HTML tags allowed. A minimal example that strips away all HTML tags:
use HTML::Restrict;
my $hr = HTML::Restrict->new();
my $processed = $hr->process('i am bold'); # returns 'i am bold'
I would recommend to stay away from HTML::Strip because it breaks utf8 encoding.