Unicode (UTF-8) reading and writing to files in Python

前端 未结 14 1277
谎友^
谎友^ 2020-11-22 17:10

I\'m having some brain failure in understanding reading and writing text to a file (Python 2.4).

# The string, which has an a-acute in it.
ss = u\'Capit\\xe1         


        
14条回答
  •  面向向阳花
    2020-11-22 17:35

    Well, your favorite text editor does not realize that \xc3\xa1 are supposed to be character literals, but it interprets them as text. That's why you get the double backslashes in the last line -- it's now a real backslash + xc3, etc. in your file.

    If you want to read and write encoded files in Python, best use the codecs module.

    Pasting text between the terminal and applications is difficult, because you don't know which program will interpret your text using which encoding. You could try the following:

    >>> s = file("f1").read()
    >>> print unicode(s, "Latin-1")
    Capitán
    

    Then paste this string into your editor and make sure that it stores it using Latin-1. Under the assumption that the clipboard does not garble the string, the round trip should work.

提交回复
热议问题