Am I using utf8::is_utf8 correctly?

亡梦爱人 提交于 2019-12-10 11:45:48

问题


Does this work correctly? Some error messages are already decode and some need do be decoded do get a correct output.

#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use open qw(:utf8 :std);
use Encode qw(decode_utf8);

# ...

if ( not eval{
    # some error-messages (utf8) are decoded some are not
    1 }
) {
    if ( utf8::is_utf8 $@ ) {
        print $@;
    }
    else {
        print decode_utf8( $@ );
    }
}

回答1:


Am I using utf8::is_utf8 correctly?

No. Any use of utf8::is_utf8 is incorrect as you should never use it! Using utf8::is_utf8 to guess at semantics of a string is what's known as an instance of The Unicode Bug. Except for inspecting the internal state of variables when debugging Perl or XS module, utf8::is_utf8 has no use.

It does not indicate whether the value in a variable is encoded using UTF-8 or not. In fact, that's impossible to know reliably. For example, does "\xC3\xA9" produce a string that's encoded using UTF-8 or not? Well, there's no way to know! It depends on whether I meant "é", "é" or something entirely different.

If the variable may contain both encoded and decoded strings, it's up to you to track that using a second variable. I strongly advise against this, though. Just decode everything as it comes in from the outside.

If you really can't, your best bet it to try to decode $@ and ignore errors. It's very unlikely that something readable that isn't UTF-8 would be valid UTF-8.

# $@ is sometimes encoded. If it's not,
# the following will leave it unchanged.
utf8::decode($@);

print $@;


来源:https://stackoverflow.com/questions/14579560/am-i-using-utf8is-utf8-correctly

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!