Unicode full width to standard ASCII (and back) in Python

橙三吉。 提交于 2020-02-05 13:16:51

问题


I need a method to convert a string from standard ASCII and Unicode FULLWIDTH characters and vice versa in pure Python 2.6. The string may also contain symbols.

I tried unicodedata.normalize but it doesn't convert symbols, and that is one-way. Other solutions found in other questions don't work well for my program (many don't convert symbols).

I am trying to create a savefile reader/writer for the PS2. For example I read this string from the file:

'\x82g\x82\x81\x82\x8c\x82\x86\x81|\x82k\x82\x89\x82\x86\x82\x85\x82r\x82\x99\x82\x93\x82\x94\x82\x85\x82\x8d\x81@\x82c\x82\x81\x82\x94\x82\x81'

that is s-jis-encoded, I decode it with .decode('s-jis'):

u'\uff28\uff41\uff4c\uff46\u2212\uff2c\uff49\uff46\uff45\uff33\uff59\uff53\uff54\uff45\uff4d\u3000\uff24\uff41\uff54\uff41'

and I print it:

Half−LifeSystem Data

this is the FULLWIDTH string that I need to convert to ASCII; it should become this:

'Half-LifeSystem Data'

(there is nothing between Life and System)

Note that I chose this save because it contains the two most recurring symbols, - and the space.

Also, I must be able to re-encode it the same way it was, because the user may rename the save, so I have to take the string from the input dialog and write it to the file again.


回答1:


I'd use a unicode.translate() to map between the two sets; the characters map one-to-one:

ascii_to_wide = dict((i, unichr(i + 0xfee0)) for i in range(0x21, 0x7f))
ascii_to_wide.update({0x20: u'\u3000', 0x2D: u'\u2212'})  # space and minus
wide_to_ascii = dict((i, unichr(i - 0xfee0)) for i in range(0xff01, 0xff5f))
wide_to_ascii.update({0x3000: u' ', 0x2212: u'-'})        # space and minus

wide_text.translate(wide_to_ascii)
ascii_text.translate(ascii_to_wide)

>>> wide_text.translate(wide_to_ascii)
u'Half-LifeSystem Data'
>>> wide_text.translate(wide_to_ascii).translate(ascii_to_wide)
u'\uff28\uff41\uff4c\uff46\u2212\uff2c\uff49\uff46\uff45\uff33\uff59\uff53\uff54\uff45\uff4d\u3000\uff24\uff41\uff54\uff41'


来源:https://stackoverflow.com/questions/16317534/unicode-full-width-to-standard-ascii-and-back-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!