Where can I find a list of allowed characters in filenames, depending on the operating system?
(e.g. on Linux, the character :
is allowed in filenames, but not on Windows)
You should start with the Wikipedia Filename page. It has a decent-sized table (Comparison of filename limitations), listing the reserved characters for quite a lot of file systems.
It also has a plethora of other information about each file system, including reserved file names such as CON
under MS-DOS. I mention that only because I was bitten by that once when I shortened an include file from const.h
to con.h
and spent half an hour figuring out why the compiler hung.
Turns out DOS ignored extensions for devices so that con.h
was exactly the same as con
, the input console (meaning, of course, the compiler was waiting for me to type in the header file before it would continue).
OK, so looking at Comparison of file systems if you only care about the main players file systems:
- Windows (FAT32, NTFS): Any Unicode except
NUL
,\
,/
,:
,*
,"
,<
,>
,|
- Mac(HFS, HFS+): Any valid Unicode except
:
or/
- Linux(ext[2-4]): Any byte except
NUL
or/
so any byte except NUL
, \
, /
, :
, *
, "
, <
, >
, |
and you can't have files/folders call .
or ..
and no control characters (of course).
To be more precise about Mac OS X (now called MacOS) /
in the Finder is interpreted to :
in the Unix file system.
This was done for backward compatibility when Apple moved from Classic Mac OS.
It is legitimate to use a /
in a file name in the Finder, looking at the same file in the terminal it will show up with a :
.
And it works the other way around too: you can't use a /
in a file name with the terminal, but a :
is OK and will show up as a /
in the Finder.
Some applications may be more restrictive and prohibit both characters to avoid confusion or because they kept logic from previous Classic Mac OS or for name compatibility between platforms.
For "English locale" file names, this works nicely. I'm using this for sanitizing uploaded file names. The file name is not meant to be linked to anything on disk, it's for when the file is being downloaded hence there are no path checks.
$file_name = preg_replace('/([^\x20-~]+)|([\\/:?"<>|]+)/g', '_', $client_specified_file_name);
Basically it strips all non-printable and reserved characters for Windows and other OSs. You can easily extend the pattern to support other locales and functionalities.
Here is the code to clean file name in python.
import unicodedata
def clean_name(name, replace_space_with=None):
"""
Remove invalid file name chars from the specified name
:param name: the file name
:param replace_space_with: if not none replace space with this string
:return: a valid name for Win/Mac/Linux
"""
# ref: https://en.wikipedia.org/wiki/Filename
# ref: https://stackoverflow.com/questions/4814040/allowed-characters-in-filename
# No control chars, no: /, \, ?, %, *, :, |, ", <, >
# remove control chars
name = ''.join(ch for ch in name if unicodedata.category(ch)[0] != 'C')
cleaned_name = re.sub(r'[/\\?%*:|"<>]', '', name)
if replace_space_with is not None:
return cleaned_name.replace(' ', replace_space_with)
return cleaned_name
来源:https://stackoverflow.com/questions/4814040/allowed-characters-in-filename