“UnicodeEncodeError: 'ascii' codec can't encode character”

前端 未结 4 1859
星月不相逢
星月不相逢 2020-12-02 19:01

I\'m trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:

UnicodeEncodeError: \'ascii\' codec can\'t e

4条回答
  •  青春惊慌失措
    2020-12-02 19:27

    You're trying to convert unicode to ascii in "strict" mode:

    >>> help(str.encode)
    Help on method_descriptor:
    
    encode(...)
        S.encode([encoding[,errors]]) -> object
    
        Encodes S using the codec registered for encoding. encoding defaults
        to the default encoding. errors may be given to set a different error
        handling scheme. Default is 'strict' meaning that encoding errors raise
        a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
        'xmlcharrefreplace' as well as any other name registered with
        codecs.register_error that is able to handle UnicodeEncodeErrors.
    

    You probably want something like one of the following:

    s = u'Protection™'
    
    print s.encode('ascii', 'ignore')    # removes the ™
    print s.encode('ascii', 'replace')   # replaces with ?
    print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
    print s.encode('ascii', 'strict')    # throw UnicodeEncodeErrors
    

提交回复
热议问题