how to read text copied from web to txt file using python

白昼怎懂夜的黑 提交于 2019-12-24 12:44:35

问题


I'm learning how to read text files. I used this way:

f=open("sample.txt")

print(f.read())

It worked fine if I typed the txt file myself. But when I copied text from a news article on the web, it produced the following error:

UnicodeEncodeError: 'charmap' codec can't encode charater '\u2014' in position 738: character maps to undefined

I tried changing the Encoding setting in Notepad++ to UTF-8 as I read somewhere it is due to that

I also tried using:

f=open("sample.txt",encoding='utf-8')

from here

But it still didn't work.


回答1:


You're on Windows and trying to print to the console. The print() is throwing the exception.

The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).

You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters, for input and output.

Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:

with open("myoutput.log", "w", encoding="utf-8") as my_log:
    my_log.write(body)

Ensure you open the file with the correct encoding.




回答2:


I assume that you are using Python 3 from the open and print syntax you use.

The offending character u"\u2014" is an em-dash (ref). As I assume you are using Windows, maybe setting the console in UTF8 (chcp 65001) could help provided you use a not too old version.

If it is a batch script, and if the print is only here to get traces, you could use explicit encoding with error='replace'. For example assuming that you console uses code page 850:

print(f.read().encode('cp850', 'replace'))

This will replace all unmapped characters with ? - not very nice, but at least it does not raise...



来源:https://stackoverflow.com/questions/36236066/how-to-read-text-copied-from-web-to-txt-file-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!