Windows Python: Changing encoding using the locale module

你说的曾经没有我的故事 提交于 2021-01-27 12:41:40

问题


Using Python 2.7

I am writing an abstract web scraper and am having problems when displaying (printing) certain characters.

I get the trace-back error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2606' in position 5: ordinal not in range(128) from printing the string containing the character.

I used the locale module to find out my OS supported settings, although I'm not certain I should use locale for my problem, and noticed the default settings where (en_US', 'cp1252'). I am trying to change it to ('en_US', 'utf-8') but sadly to no avail.

#code for default settings
print locale.getdefaultlocale()

This is the code I used to narrow down my locale setting options. ( No problems here, the code is just so anyone that wants to, can follow along )

import locale
all = locale.locale_alias().items()
utfs = [(k,v) for k,v in all if 'utf' in k.lower() or 'utf' in v.lower()]

# utf settings starting with en
en_utfs = [(k,v) for k,v in utfs if k.lower()[:2].lower() == 'en' or 
            v.lower()[:2] == 'en'

print en_utfs

This gives the output:

[('en_ie.utf8@euro', 'en_IE.UTF-8'), ('universal.utf8@ucs4', 'en_US.UTF-8')]

Here is where my problem lies; with trying to change the setting to en_US.UTF-8.

[IN]: locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )
[OUT]: Traceback code ...
[OUT]: locale.Error: unsupported locale setting

Sorry for all the code, for some reason I felt the excessive need to do so.


回答1:


Check this https://docs.moodle.org/dev/Table_of_locales

I think in windows you need to set 'localewin' value instead of the locale name. Setting locale.setlocale( locale.LC_ALL, 'English_United States.1252' ) worked for me in windows. I also tried setting different locales Dutch_Netherlands.1252 and they worked. Though this might not solve your problem of UnicodeEncodeError, but I think this atleast explains why you are unable to set the locale.




回答2:


I couldn't fix my problem, but I found a work around by remove all non-ASCII characters. See stack answer replace non ascii-characters with a single space




回答3:


You need to use the full name. So for example use:

locale.setlocale( locale.LC_CTYPE, 'Chinese (Simplified)_People\'s Republic of China' )  

instead of

locale.setlocale(locale.LC_ALL,'zh_CN.cpk936')      

If this was successful you should expect this result:

print(locale.getlocale(locale.LC_CTYPE))    
("Chinese (Simplified)_People's Republic of China", '936')       


来源:https://stackoverflow.com/questions/27437325/windows-python-changing-encoding-using-the-locale-module

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!