My Perl program takes some text from a disk file as input, wraps it in some XML, then outputs it to STDOUT. The input is nominally UTF-8, but sometimes has junk inserted. I
You have a utf8 string containing some invalid utf8...
This replaces it with a default 'bad char'.
use Encode qw(decode encode); my $octets = decode('UTF-8', $malformed_utf8, Encode::FB_DEFAULT); my $good_utf8 = encode('UTF-8', $octets, Encode::FB_CROAK);