Unicode arabic string to user it

帅比萌擦擦* 提交于 2019-12-12 06:36:33

问题


i have a variable holding a value like x='مصطفى' and i want to convert it to the form of u'مصطفى' to user it again in some functions .. when i try to do u''+x it alawys give me an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

Any help ?


回答1:


You have to know what encoding those bytes are in, and them .decode(encoding) them to get a Unicode string. If you received them from some API, utf8 is a good guess. If you read the bytes from a file typed in Windows Notepad, it is more likely some Arabic(?) code page.

PythonWin 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32.
>>> x='مصطفى' # "Just bytes" in whatever encoding my console uses
>>> x         # Looks like UTF-8.
'\xd9\x85\xd8\xb5\xd8\xb7\xd9\x81\xd9\x89'
>>> x.decode('utf8')  # Success
u'\u0645\u0635\u0637\u0641\u0649'
>>> print(x.decode('utf8'))
مصطفى



回答2:


thanks I solved it :)

the solution will be to do so

u''.encode('utf-8')+x



回答3:


There's two things.

First the meaning of x='مصطفى' is ill-defined, and changes if you save your source file in another encoding. On the other hand x=u'مصطفى'.encode('utf-8') unambiguously means “the bytes you get when you encode that text with UTF-8”.

Second, either use bytes 'abc' or b'abc' or unicode u'abc', but don't mix them. Mixing them in python 2.x produces results which are dependent on where you execute that code. In python 3.x it raises an error (for good reasons).

So given a byte string x, either:

# bytes
'' + x

or:

# unicode, so decode the byte string
u'' + x.decode('utf-8')


来源:https://stackoverflow.com/questions/37555473/unicode-arabic-string-to-user-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!