Using unicode character u201c

不想你离开。 提交于 2020-08-07 04:45:07

问题


I'm a new to python and am having problems understand unicode. I'm using Python 3.4. I've spent an entire day trying to figure this out by reading about unicode including http://www.fileformat.info/info/unicode/char/201C/index.htm and http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html

I need to refer to special quotes because they are used in the text I'm analyzing. I did test that the W7 command window can read and write the 2 special quote characters. To make things simple, I wrote a one line script:

print ('“') # that's the special quote mark in between normal single quotes

and get this output:

Traceback (most recent call last):
  File "C:\Users\David\Documents\Python34\Scripts\wordCount3.py", line 1, in <module>
    print ('\u201c')
  File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 0: character maps to <undefined>

So how do I write something to refer to these two characters u201C and u201D?

Is this the correct encoding choice in the file open statement?

with open(fileIn, mode='r', encoding='utf-8', errors='replace') as f:

回答1:


The reason is that in 3.x Python You can't just mix unicode strings with byte strings. Probably, You've read the manuals dealing with Python 2.x where such things are possible as long as bytestring contains convertable chars.

print('\u201c', '\u201d')

works fine for me, so the only reason is that you're using wrong encoding for source file or terminal.

Also You may explicitly point python to codepage you're using, by throwing the next line ontop of your source:

 # -*- coding: utf-8 -*-

Added: it seems that You're working on Windows machine, if so you could change Your console codepage to utf-8 by running

chcp 65001

before You fire up your python interpreter. That changes would be temporary, and if You want permanent, run the next .reg file:

Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Console]
"CodePage"=dword:fde9


来源:https://stackoverflow.com/questions/35281774/using-unicode-character-u201c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!