As @ikegami suggested, I reported this as a bug.
Bug #121783 for perl5: Windows: UTF-8 encoded output in cmd.exe with code page 65001 causes unexpec
The following program produces the correct output:
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
binmode(STDOUT, ":unix:encoding(utf8):crlf");
print 'αβγxyz', "\n";
Output:
C:\…> chcp 65001 Active code page: 65001 C:\…> perl pttt.pl αβγxyz
which seems to indicate to me there is some funkiness with the :crlf
layer. I do not understand the internals enough to comment intelligently about this at this point.
After many experiments, I have come to the conclusion that, if the console is already set to 65001 code page, binmode(STDOUT, ":unix:encoding(utf8):crlf");
will "work". However, note the following:
binmode(STDOUT, ":unix:encoding(utf8):crlf");
print Dump [
map {
my $x = defined($_) ? $_ : '';
$x =~ s/\A([0-9]+)\z/sprintf '0x%08x', $1/eg;
$x;
} PerlIO::get_layers(STDOUT, details => 1)
];
print "αβγxyz\n";
gives me:
--- - unix - '' - 0x01205200 - crlf - '' - 0x00c85200 - unix - '' - 0x01201200 - encoding - utf8 - 0x00c89200 - crlf - '' - 0x00c8d200 αβγxyz
As before, I do not know enough to know the full consequences of this. I do intend to build a debug perl
at some point to further diagnose this.
I examined this a little further. Here are some observations from that post:
The flags for the first unix
layer are 0x01205200 = CANWRITE | TRUNCATE | CRLF | OPEN | NOTREG
. Why is CRLF
set for the unix
layer on Windows? I do not know about the internals enough to understand this.
However, the flags for the second unix
layer, the one pushed by my explicit binmode
, are 0x01201200 = 0x01205200 & ~CRLF. This is what would have made sense to me to begin with.
The flags for the first crlf layer are 0x00c85200 = CANWRITE | TRUNCATE | CRLF | LINEBUF | FASTGETS | TTY
. The flags for the second layer
, which I push after the :encoding(utf8)
layer are 0x00c8d200 = 0x00c85200 | UTF8
.
Now, if I open a file using open my $fh, '>:encoding(utf8)', 'ttt'
, and dump the same information, I get:
--- - unix - '' - 0x00201200 - crlf - '' - 0x00405200 - encoding - utf8 - 0x00409200
As expected, the unix
layer does not set the CRLF
flag.