RPython ord() with non-ascii character

落花浮王杯 提交于 2019-12-13 04:16:05

问题


I'm making a virtual machine in RPython using PyPy. My problem is, that I am converting each character into the numerical representation. For example, converting the letter "a" provides this result, 97. And then I convert the 97 to hex, so I get: 0x61.

So for example, I'm trying to convert the letter "á" into the hexadecimal representation which should be: 0xe1 but instead I get 0xc3 0xa1

Is there a specific encoding I need to use? Currently I'm using UTF-8.

--UPDATE--

Where instr is "á", (including the quotes)

for char in instr:
    char = str(int(ord(char)))
    char = hex(int(char))
    char = char[2:]
    print char # Prints 22 C3 A1 22, 22 is each of the quotes
    # The desired output is 22 E1 22

回答1:


#!/usr/bin/env python
# -*- coding: latin-1 -*-

char = 'á'

print str(int(ord(char)))
print hex(int(char))
print char.decode('latin-1')

Gives me:

225
0xe1
0xe1



回答2:


You are using version 2 of Python language therefore your string: "á" is a byte string, and its contents depend on the encoding of your source file. If the encoding is UTF-8, they are C3 A1 - the string contains two bytes.

If you want to convert it to Unicode codepoints (aka characters), or UTF-16 codepoints (depending on your Python installation), convert it to unicode first, for example using .decode('utf-8').

# -*- encoding: utf-8 -*-

def stuff(instr):
  for char in instr:
    char = str(int(ord(char)))
    char = hex(int(char))
    # I'd replace those two lines above with char = hex(ord(char))
    char = char[2:]
    print char 

stuff("á")
print("-------")
stuff(u"á")

Outputs:

c3
a1
-------
e1


来源:https://stackoverflow.com/questions/23271542/rpython-ord-with-non-ascii-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!