'Wide character in subroutine entry\" - UTF-8 encoded cyrillic words as sequence of bytes

≡放荡痞女 提交于 2019-12-05 05:58:23

md5_hex expects a string of bytes for input, but you're passing a decoded string (a string of Unicode Code Points). Explicitly encode the string.

use strict;
use utf8;
use Digest::MD5;
use Encode;
# ....
# $_ is assumed to be utf8 encoded without check
print Digest::MD5::md5_hex(Encode::encode_utf8($_)),"\n";
# Conversion only when required:
print Digest::MD5::md5_hex(utf8::is_utf8($_) ? Encode::encode_utf8($_) : $_),"\n";

my real problem is that length($_) reports too high values

Yes, you are reading from the ARGV file handle and haven't set its encoding to UTF-8

You can use the open pragma to fix this. Instead of all your binmode statements, use

use open qw/ :std :encoding(utf8) /;

which will change the default open mode for all filehandles, including the standard ones, to :encoding(utf8)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!