How do I sanitize invalid UTF-8 in Perl?

前端 未结 2 1187
生来不讨喜
生来不讨喜 2020-12-05 10:26

My Perl program takes some text from a disk file as input, wraps it in some XML, then outputs it to STDOUT. The input is nominally UTF-8, but sometimes has junk inserted. I

2条回答
  •  一个人的身影
    2020-12-05 11:19

    You have a utf8 string containing some invalid utf8...

    This replaces it with a default 'bad char'.

    use Encode qw(decode encode);
    
    my $octets    = decode('UTF-8', $malformed_utf8, Encode::FB_DEFAULT);
    
    my $good_utf8 = encode('UTF-8', $octets,         Encode::FB_CROAK);
    

提交回复
热议问题