How can I decode UTF-16 data in Perl when I don't know the byte order?

后端 未结 3 1807
心在旅途
心在旅途 2020-12-30 13:12

If I open a file ( and specify an encoding directly ) :

open(my $file,\"<:encoding(UTF-16)\",\"some.file\") || die \"error $!\\n\";
while(<$file>) {         


        
3条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-30 13:54

    What you're trying to do impossible.

    You're reading lines of text without specifying an encoding, so every byte that contains a newline character (default \x0a) ends a line. But this newline character may very well be in the middle of an UTF-16 character, in which case your next line can't be decoded. If your data is UTF-16LE, this will happen all the time – line feeds are \x0a \x00. If you have UTF16-BE, you might get lucky (newlines are \x00 \x0a), until you get a character with \x0a in the high byte.

    So, don't do that, open the file in the right encoding.

提交回复
热议问题