I want to create a sane/safe filename (i.e. somewhat readable, no \"strange\" characters, etc.) from some random Unicode string (mich might contain just anything).
(
My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:
"".join([c for c in text if re.match(r'\w', c)])
That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.