问题
What is the most cross platform way of removing bad path characters (e.g. "\" or ":" on Windows) in Python?
Solution
Because there seems to be no ideal solution I decided to be relatively restrictive and did use the following code:
def remove(value, deletechars):
for c in deletechars:
value = value.replace(c,'')
return value;
print remove(filename, '\/:*?"<>|')
回答1:
Unfortunately, the set of acceptable characters varies by OS and by filesystem.
Windows:
- Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
- The following reserved characters are not allowed:
< > : " / \ | ? * - Characters whose integer representations are in the range from zero through 31 are not allowed.
- Any other character that the target file system does not allow.
- The following reserved characters are not allowed:
The list of accepted characters can vary depending on the OS and locale of the machine that first formatted the filesystem.
.NET has GetInvalidFileNameChars and GetInvalidPathChars, but I don't know how to call those from Python.
- Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
- Mac OS: NUL is always excluded, "/" is excluded from POSIX layer, ":" excluded from Apple APIs
- HFS+: any sequence of non-excluded characters that is representable by UTF-16 in the Unicode 2.0 spec
- HFS: any sequence of non-excluded characters representable in MacRoman (default) or other encodings, depending on the machine that created the filesystem
- UFS: same as HFS+
- Linux:
- native (UNIX-like) filesystems: any byte sequence excluding NUL and "/"
- FAT, NTFS, other non-native filesystems: varies
Your best bet is probably to either be overly-conservative on all platforms, or to just try creating the file name and handle errors.
回答2:
I think the safest approach here is to just replace any suspicious characters. So, I think you can just replace (or get rid of) anything that isn't alphanumeric, -, _, a space, or a period. And here's how you do that:
import re
re.sub('[^\w\-_\. ]', '_', filename)
The above escapes every character that's not a letter, '_', '-', '.' or space with an '_'. So, if you're looking at an entire path, you'll want to throw os.sep in the list of approved characters as well.
Here's some sample output:
In [27]: re.sub('[^\w\-_\. ]', '_', 'some\\*-file._n\\\\ame')
Out[27]: 'some__-file._n__ame'
回答3:
If you are using python try os.path to avoid cross platform issues with paths.
回答4:
That character is in os.sep, it'll be "\" or ":", depending on which system you're on.
来源:https://stackoverflow.com/questions/1033424/how-to-remove-bad-path-characters-in-python