Read a text file with non-ASCII characters in an unknown encoding

后端 未结 2 1189
一生所求
一生所求 2020-12-16 14:59

I want to read a file that contains also German and not only characters. I found that i can do like this

  >>> import codecs
  >         


        
2条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-16 15:40

    You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:

    $ pip install chardet
    

    Then, for example reading the file in binary mode:

    >>> import chardet
    >>> chardet.detect(open("file.txt", "rb").read())
    {'confidence': 0.9690625, 'encoding': 'utf-8'}
    

    So then:

    >>> import codecs
    >>> import unicodedata
    >>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()
    

提交回复
热议问题