可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am writing a script that takes a UTF-16 encoded text file as input and outputs a UTF-16 encoded text file.
use open "encoding(UTF-16)"; open INPUT, "< input.txt" or die "cannot open > input.txt: $!\n"; open(OUTPUT,"> output.txt"); while(<INPUT>) { print OUTPUT "$_\n" }
Let's just say that my program writes everything from input.txt into output.txt.
This WORKS perfectly fine in my cygwin environment, which is using "This is perl 5, version 14, subversion 2 (v5.14.2) built for cygwin-thread-multi-64int"
But in my Windows environment, which is using "This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x64-multi-thread",
Every line in output.txt is pre-pended with crazy symbols except the first line.
For example:
Can anyone give some insight on why it works on cygwin but not windows?
EDIT: After printing the encoded layers as suggested.
In Windows environment:
unix crlf encoding(UTF-16) utf8 unix crlf encoding(UTF-16) utf8
In Cygwin environment:
unix perlio encoding(UTF-16) utf8 unix perlio encoding(UTF-16) utf8
The only difference is between the perlio and crlf layer.
回答1:
[ I was going to wait and give a thorough answer, but it's probably better if I give you a quick answer than nothing. ]
The problem is that crlf
and the encoding
layers are in the wrong order. Not your fault.
For example, say you do print "a\nb\nc\n";
using UTF-16le (since it's simpler and it's probably what you actually want). You'd end up with
61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00
instead of
61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00
I don't think you can get the right results with the open
pragma or with binmode
, but it can be done using open
.
open(my $fh, '<:raw:encoding(UTF-16):crlf', $qfn)
You'll need to append a :utf8
with some older version, IIRC.
It works on cygwin because the crlf
layer is only added on Windows. There you'd get
61 00 0A 00 62 00 0A 00 63 00 0A 00
回答2:
You have a typo in your encoding. It should be use open ":encoding(UTF-16)"
Note the colon. I don't know why it would work on Cygwin but not Windows, but could also be a 5.12 vs 5.14 thing. Perl seems to make up for it, but it could be what's causing your problem.
If that doesn't do it, check if the encoding is being applied to your filehandles.
print map { "$_\n" } PerlIO::get_layers(*INPUT); print map { "$_\n" } PerlIO::get_layers(*OUTPUT);
Use lexical filehandles (ie. open my $fh, "<", $file
). Glob filehandles are global and thus something else in your program might be interfering with them.
If all that checks out, if lexical filehandles are getting the encoding(UTF-16)
applied, let us know and we can try something else.
UPDATE: This may provide your answer: "BOMed UTF files are not suitable for streaming models, and they must be slurped as binary files instead." Looks like you have to read the file in as binary and do the encoding as a string. This may have been a bug fixed in 5.14.
UPDATE 2: Yep, I can confirm this is a bug that was fixed in 5.14.