Python get character code in different encoding?

后端未结

关注

 3  1657

情书的邮戳 2021-02-04 06:05

Given a character code as integer number in one encoding, how can you get the character code in, say, utf-8 and again as integer?

3条回答

萌比男神i (楼主)

2021-02-04 06:55
You can only map an "integer number" from one encoding to another if they are both single-byte encodings.

Here's an example using "iso-8859-15" and "cp1252" (aka "ANSI"):
```
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('cp1252')
'\x80'
>>> ord(s.encode('cp1252'))
128
>>> ord(s.encode('iso-8859-15'))
164
```
Note that ord is here being used to get the ordinal number of the encoded byte. Using ord on the original unicode string would give its unicode code point:
```
>>> ord(s)
8364
```
The reverse operation to ord can be done using either chr (for codes in the range 0 to 127) or unichr (for codes in the range 0 to sys.maxunicode):
```
>>> print chr(65)
A
>>> print unichr(8364)
€
```
For multi-byte encodings, a simple "integer number" mapping is usually not possible.

Here's the same example as above, but using "iso-8859-15" and "utf-8":
```
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('utf-8')
'\xe2\x82\xac'
>>> [ord(c) for c in s.encode('iso-8859-15')]
[164]
>>> [ord(c) for c in s.encode('utf-8')]
[226, 130, 172]
```
The "utf-8" encoding uses three bytes to encode the same character, so a one-to-one mapping is not possible. Having said that, many encodings (including "utf-8") are designed to be ASCII-compatible, so a mapping is usually possible for codes in the range 0-127 (but only trivially so, because the code will always be the same).
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...