问题
Does this work correctly? Some error messages are already decode and some need do be decoded do get a correct output.
#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use open qw(:utf8 :std);
use Encode qw(decode_utf8);
# ...
if ( not eval{
# some error-messages (utf8) are decoded some are not
1 }
) {
if ( utf8::is_utf8 $@ ) {
print $@;
}
else {
print decode_utf8( $@ );
}
}
回答1:
Am I using utf8::is_utf8 correctly?
No. Any use of utf8::is_utf8
is incorrect as you should never use it! Using utf8::is_utf8
to guess at semantics of a string is what's known as an instance of The Unicode Bug. Except for inspecting the internal state of variables when debugging Perl or XS module, utf8::is_utf8
has no use.
It does not indicate whether the value in a variable is encoded using UTF-8 or not. In fact, that's impossible to know reliably. For example, does "\xC3\xA9"
produce a string that's encoded using UTF-8 or not? Well, there's no way to know! It depends on whether I meant "é"
, "é"
or something entirely different.
If the variable may contain both encoded and decoded strings, it's up to you to track that using a second variable. I strongly advise against this, though. Just decode everything as it comes in from the outside.
If you really can't, your best bet it to try to decode $@
and ignore errors. It's very unlikely that something readable that isn't UTF-8 would be valid UTF-8.
# $@ is sometimes encoded. If it's not,
# the following will leave it unchanged.
utf8::decode($@);
print $@;
来源:https://stackoverflow.com/questions/14579560/am-i-using-utf8is-utf8-correctly