Create (sane/safe) filename from any (unsafe) string

故事扮演 提交于 2019-11-30 03:06:51
Remi

Python:

"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()

this accepts Unicode characters but removes line breaks, etc.

example:

filename = u"ad\nbla'{-+\)(ç?"

gives: adblaç

edit str.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.

    keepcharacters = (' ','.','_')
    "".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()

My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:

    "".join([c for c in text if re.match(r'\w', c)])

That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.

uglycoyote

There are a few reasonable answers here, but in my case I want to take something which is a string which might have spaces and punctuation and rather than just removing those, i would rather replace it with an underscore. Even though spaces are an allowable filename character in most OS's they are problematic. Also, in my case if the original string contained a period I didn't want that to pass through into the filename, or it would generate "extra extensions" that I might not want (I'm appending the extension myself)

def make_safe_filename(s):
    def safe_char(c):
        if c.isalnum():
            return c
        else:
            return "_"
    return "".join(safe_char(c) for c in s).rstrip("_")

print(make_safe_filename( "hello you crazy $#^#& 2579 people!!! : die!!!" ) + ".gif")

prints:

hello_you_crazy_______2579_people______die___.gif

More or less what has been mentioned here with regexp, but in reverse (replace any NOT listed):

>>> import re
>>> filename = u"ad\nbla'{-+\)(ç1?"
>>> re.sub(r'[^\w\d-]','_',filename)
u'ad_bla__-_____1_'

No solutions here, only problems that you must consider:

  • what is your minimum maximum filename length? (e.g. DOS supporting only 8-11 characters; most OS don't support >256 characters)

  • what filenames are forbidden in some context? (Windows still doesn't support saving a file as CON.TXT -- see https://blogs.msdn.microsoft.com/oldnewthing/20031022-00/?p=42073)

  • remember that . and .. have specific meanings (current/parent directory) and are therefore unsafe.

  • is there a risk that filenames will collide -- either due to removal of characters or the same filename being used multiple times?

Consider just hashing the data and using the hexdump of that as a filename?

Python:

for c in r'[]/\;,><&*:%=+@!#^()|?^':
    filename = filename.replace(c,'')

(just an example of characters you will want to remove) The r in front of the string makes sure the string is interpreted in it's raw format, allowing you to remove backslash \ as well

Edit: regex solution in Python:

import re
re.sub(r'[]/\;,><&*:%=+@!#^()|?^', '', filename)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!