How do I write a file whose *filename* contains utf8 characters in Perl?

走远了吗. 提交于 2019-12-01 03:39:09

First of all, saying "UTF-8 character" is weird. UTF-8 can encode any Unicode character, so the UTF-8 character set is the Unicode character set. That means you want to create file whose name contain Unicode characters, and more specifically, Unicode characters that aren't in cp1252.

I've answered this on PerlMonks in the past. Answer copied below.


Perl treats file names as opaque strings of bytes. That means that file names need to be encoded as per your "locale"'s encoding (ANSI code page).

In Windows, code page 1252 is commonly used, and thus the encoding is usually cp1252.* However, cp1252 doesn't support Tamil and Hindi characters [or "☺"].

Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins**. You can use Win32API::File's CreateFileW, though. IIRC, you need to still need to encode the file name yourself. If so, you'd use UTF-16le as the encoding.

Aforementioned Win32::Unicode appears to handle some of the dirty work of using Win32API::File for you. I'd also recommend starting with that.

* — The code page is returned (as a number) by the GetACP system call. Prepend "cp" to get the encoding.

** — Perl's support for Windows sucks in some respects.

The following runs on Windows 7, ActiveState Perl. It writes "hello there" to a file with hebrew characters in its name:

#-----------------------------------------------------------------------
# Unicode file names on Windows using Perl
# Philip R Brenan at gmail dot com, Appa Apps Ltd, 2013
#-----------------------------------------------------------------------

use feature ":5.16";
use Data::Dump qw(dump);
use Encode qw/encode decode/;
use Win32API::File qw(:ALL);

# Create a file with a unicode name

my $e  = "\x{05E7}\x{05EA}\x{05E7}\x{05D5}\x{05D5}\x{05D4}".
         "\x{002E}\x{0064}\x{0061}\x{0074}\x{0061}"; # File name in UTF-8
my $f  = encode("UTF-16LE", $e);  # Format supported by NTFS
my $g  = eval dump($f);           # Remove UTF ness
   $g .= chr(0).chr(0);           # 0 terminate string
my $F  = Win32API::File::CreateFileW
 ($g, GENERIC_WRITE, 0, [], OPEN_ALWAYS, 0, 0); #  Create file via Win32API
say $^E if $^E;                   # Write any error message

# Write to the file

OsFHandleOpen(FILE, $F, "w") or die "Cannot open file";
binmode FILE;                      
print FILE "hello there\n";      
close(FILE);

no need to encode the filename (at least not on linux). This code works on my linux system:

use warnings;
use strict;

#   Text is stored in utf8 within *this* file.
use utf8;

my $with_smiley = $ARGV[0] || 0;

my $filename = 'äöü' .
  ($with_smiley ? '?' : '' ).
     '.txt';

open my $fh, '>', $filename or die "open: $!";

binmode $fh, ':utf8';

print $fh "Filename: $filename\n";

close $fh;

HTH, Paul

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!