Are there any standalonenish solutions for normalizing international unicode text to safe ids and filenames in Python?
E.g. turn My International Text: åäö
The following will remove accents from whatever characters Unicode can decompose into combining pairs, discard any weird characters it can't, and nuke whitespace:
# encoding: utf-8
from unicodedata import normalize
import re
original = u'ľ š č ť ž ý á í é'
decomposed = normalize("NFKD", original)
no_accent = ''.join(c for c in decomposed if ord(c)<0x7f)
no_spaces = re.sub(r'\s', '_', no_accent)
print no_spaces
# output: l_s_c_t_z_y_a_i_e
It doesn't try to get rid of characters disallowed on filesystems, but you can steal DANGEROUS_CHARS_REGEX
from the file you linked for that.