Python - Encoding string - Swedish Letters

一笑奈何 提交于 2019-11-27 19:26:52

问题


I'm having some trouble with Python's raw_input command (Python2.6), For some reason, the raw_input does not get the converted string that swedify() produces and this giving me a encoding error which i'm aware of, that's why i made swedify() to begin with. Here's what i'm trying to do:

elif cmd in ('help', 'hjälp', 'info'):
    buffert += 'Just nu är programmet relativt begränsat,\nDe funktioner du har att använda är:\n'
    buffert += ' * historik :: skriver ut all din historik\n'
    buffert += ' * ändra <något> :: ändrar något i databasen, följande finns att ändra:\n'
    print swedify(buffert)

This works just fine, it outputs the swedish characters just as i want them to the console. But when i try to (in the same code, with same \x?? values, print this piece:

core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
core['goalTime'] = raw_input(swedify('Vad är ditt mål i minuter att springa ' +  core['goalDistance'] + 'km på: '))

Then i get this:

C:\Users\Anon>python löp.py
Traceback (most recent call last):
  File "l÷p.py", line 92, in <module>
    core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Now i've googled around, found some "solutions" but none of them work, some sad that i have to create a batch script that executes chcp ??? in the beginning, but that's not a clean solution IMO.

Here is swedify:

def swedify(inp):
    try:
        return inp.decode('utf-8')
    except:
        return '(!Dec:) ' + str(inp)

Any solutions on how to get raw_input to read my return value from swedify()? i've tried from encodings import getencoder, getdecoder and others but nothing for the better.


回答1:


You mention the fact that you received an encoding error which motivated you to write swedify in the first place, and you have found solutions around chcp which is a Windows command.

On *nix systems with UTF-8 terminals, swedify is not necessary:

>>> raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 100
'100'
>>> a = raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 200
>>> a
'200'

FWIW, when I do use swedify, I get the same error you do:

>>> def swedify(inp):
...     try:
...         return inp.decode('utf-8')
...     except:
...         return '(!Dec:) ' + str(inp)
... 
>>> swedify('Hur långt i kilometer är ditt mål: ') 
u'Hur l\xe5ngt i kilometer \xe4r ditt m\xe5l: '
>>> raw_input(swedify('Hur långt i kilometer är ditt mål: '))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Your swedify function returns a unicode object. The built-in raw_input is just not happy with unicode objects.

>>> raw_input("å")
åeee
'eee'
>>> raw_input(u"å")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128)

You might want to try this in Python 3. See this Python bug.

Also of interest: How to read Unicode input and compare Unicode strings in Python?.

UPDATE According to this blog post there is a way to set the system's default encoding. This might be worth a try.




回答2:


For me it worked fine with:

#-*- coding: utf-8 -*-
import sys
import codecs
koden=sys.stdin.encoding

a=raw_input( u'Frågan är öppen? '.encode(koden))
print a

Per




回答3:


On Windows, the console's native Unicode support is broken. Even the apparent UTF-8 codepage isn't a proper fix.

To read and write with Windows console you need use https://github.com/Drekin/win-unicode-console, which works directly with the underlying console API, so that multi-byte characters are read and written correctly.




回答4:


Windows command prompt uses Codepage 850 when using Swedish regional settings (https://en.wikipedia.org/wiki/Code_page_850). It's probably used because of backwards compatibility with old MS-Dos programs.

You can set Windows command prompt to use UTF-8 as encoding by entering: chcp 65001 (Unicode characters in Windows command line - how?)




回答5:


Try this magic comment at the very top of your script:

# -*- coding: utf-8 -*-

Here is some information about it: http://www.python.org/dev/peps/pep-0263/




回答6:


Solution to a lot of problems:


Edit: C:\Python??\Lib\Site.py Replace "del sys.setdefaultencoding" with "pass"

Then,
Put this in the top of your code:

sys.setdefaultencoding('latin-1')

The holy grail of fixing the Swedish/non-UTF8 compatible characters.



来源:https://stackoverflow.com/questions/7315629/python-encoding-string-swedish-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!