Python UnicodeDecodeError when writing German letters

前端未结

关注

 4  974

星月不相逢 2020-12-17 21:53

I\'ve been banging my head on this error for some time now and I can\'t seem to find a solution anywhere on SO, even though there are similar questions.

Here\'s my c

4条回答

伪装坚强ぢ (楼主)

2020-12-17 22:38

Like already suggested your error results from this line:

f.write(("\"%s\" = \"%s\";\n" % ("no_internet", value)).encode("utf-8"))

it should be:

f.write(('"{}" = "{}";\n'.format('no_internet', value.encode('utf-8'))))

A note on unicode and encodings

If woking with Python 2, software should only work with unicode strings internally, converting to a particular encoding on output.

Do prevent from making the same error over and over again you should make sure you understood the difference between ascii and utf-8 encodings and also between str and unicode objects in Python.

The difference between ASCII and UTF-8 encoding:

Ascii needs just one byte to represent all possible characters in the ascii charset/encoding. UTF-8 needs up to four bytes to represent the complete charset.

ascii (default)
1    If the code point is < 128, each byte is the same as the value of the code point.
2    If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)

utf-8 (unicode transformation format)
1    If the code point is <128, it’s represented by the corresponding byte value.
2    If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.
3    Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

The difference between str and unicode objects:

You can say that str is baiscally a byte string and unicode is a unicode string. Both can have a different encoding like ascii or utf-8.

str vs. unicode
1   str     = byte string (8-bit) - uses \x and two digits
2   unicode = unicode string      - uses \u and four digits
3   basestring
        /\
       /  \
    str    unicode

If you follow some simple rules you should go fine with handling str/unicode objects in different encodings like ascii or utf-8 or whatever encoding you have to use:

Rules
1    encode(): Gets you from Unicode -> bytes
     encode([encoding], [errors='strict']), returns an 8-bit string version of the Unicode string,
2    decode(): Gets you from bytes -> Unicode
     decode([encoding], [errors]) method that interprets the 8-bit string using the given encoding
3    codecs.open(encoding=”utf-8″): Read and write files directly to/from Unicode (you can use any encoding, not just utf-8, but utf-8 is most common).
4    u”: Makes your string literals into Unicode objects rather than byte sequences.
5    unicode(string[, encoding, errors])

Warning: Don’t use encode() on bytes or decode() on Unicode objects

And again: Software should only work with Unicode strings internally, converting to a particular encoding on output.

0 讨论(0)

查看其它4个回答