How do I get str.translate to work with Unicode strings?

前端 未结 7 930
孤城傲影
孤城傲影 2020-12-01 00:17

I have the following code:

import string
def translate_non_alphanumerics(to_translate, translate_to=\'_\'):
    not_letters_or_digits = u\'!\"#%\\\'()*+,-./:         


        
7条回答
  •  萌比男神i
    2020-12-01 00:33

    The Unicode version of translate requires a mapping from Unicode ordinals (which you can retrieve for a single character with ord) to Unicode ordinals. If you want to delete characters, you map to None.

    I changed your function to build a dict mapping the ordinal of every character to the ordinal of what you want to translate to:

    def translate_non_alphanumerics(to_translate, translate_to=u'_'):
        not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~'
        translate_table = dict((ord(char), translate_to) for char in not_letters_or_digits)
        return to_translate.translate(translate_table)
    
    >>> translate_non_alphanumerics(u'!')
    u'_foo__'
    

    edit: It turns out that the translation mapping must map from the Unicode ordinal (via ord) to either another Unicode ordinal, a Unicode string, or None (to delete). I have thus changed the default value for translate_to to be a Unicode literal. For example:

    >>> translate_non_alphanumerics(u'!', u'bad')
    u'badfoobadbad'
    

提交回复
热议问题