Python UnicodeDecodeError when writing German letters

前端 未结 4 974
星月不相逢
星月不相逢 2020-12-17 21:53

I\'ve been banging my head on this error for some time now and I can\'t seem to find a solution anywhere on SO, even though there are similar questions.

Here\'s my c

4条回答
  •  伪装坚强ぢ
    2020-12-17 22:38

    Like already suggested your error results from this line:

    f.write(("\"%s\" = \"%s\";\n" % ("no_internet", value)).encode("utf-8"))
    

    it should be:

    f.write(('"{}" = "{}";\n'.format('no_internet', value.encode('utf-8'))))
    

    A note on unicode and encodings

    If woking with Python 2, software should only work with unicode strings internally, converting to a particular encoding on output.

    Do prevent from making the same error over and over again you should make sure you understood the difference between ascii and utf-8 encodings and also between str and unicode objects in Python.

    The difference between ASCII and UTF-8 encoding:

    Ascii needs just one byte to represent all possible characters in the ascii charset/encoding. UTF-8 needs up to four bytes to represent the complete charset.

    ascii (default)
    1    If the code point is < 128, each byte is the same as the value of the code point.
    2    If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)
    
    utf-8 (unicode transformation format)
    1    If the code point is <128, it’s represented by the corresponding byte value.
    2    If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.
    3    Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.
    

    The difference between str and unicode objects:

    You can say that str is baiscally a byte string and unicode is a unicode string. Both can have a different encoding like ascii or utf-8.

    str vs. unicode
    1   str     = byte string (8-bit) - uses \x and two digits
    2   unicode = unicode string      - uses \u and four digits
    3   basestring
            /\
           /  \
        str    unicode
    

    If you follow some simple rules you should go fine with handling str/unicode objects in different encodings like ascii or utf-8 or whatever encoding you have to use:

    Rules
    1    encode(): Gets you from Unicode -> bytes
         encode([encoding], [errors='strict']), returns an 8-bit string version of the Unicode string,
    2    decode(): Gets you from bytes -> Unicode
         decode([encoding], [errors]) method that interprets the 8-bit string using the given encoding
    3    codecs.open(encoding=”utf-8″): Read and write files directly to/from Unicode (you can use any encoding, not just utf-8, but utf-8 is most common).
    4    u”: Makes your string literals into Unicode objects rather than byte sequences.
    5    unicode(string[, encoding, errors]) 
    

    Warning: Don’t use encode() on bytes or decode() on Unicode objects

    And again: Software should only work with Unicode strings internally, converting to a particular encoding on output.

提交回复
热议问题