Python 2 maketrans() function doesn't work with Unicode: “the arguments are different lengths” when they actually are

牧云@^-^@ 提交于 2019-11-29 09:23:27

问题


[Python 2] SUB = string.maketrans("0123456789","₀₁₂₃₄₅₆₇₈₉")

this code produces the error:

ValueError: maketrans arguments must have same length

I am unsure why this occurs because the strings are the same length. My only idea is that the subscript text length is somehow different than standard size characters but I don't know how to get around this.


回答1:


No, the arguments are not the same length:

>>> len("0123456789")
10
>>> len("₀₁₂₃₄₅₆₇₈₉")
30

You are trying to pass in encoded data; I used UTF-8 here, where each digit is encoded to 3 bytes each.

You cannot use str.translate() to map ASCII bytes to UTF-8 byte sequences. Decode your string to unicode and use the slightly different unicode.translate() method; it takes a dictionary instead:

nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}

This creates a dictionary mapping Unicode codepoints (integers), which you can then use on a Unicode string:

>>> nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}
>>> u'99 bottles of beer on the wall'.translate(nummap)
u'\u2089\u2089 bottles of beer on the wall'
>>> print u'99 bottles of beer on the wall'.translate(nummap)
₉₉ bottles of beer on the wall

You can then encode the output to UTF-8 again if you so wish.

From the method documentation:

For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.



来源:https://stackoverflow.com/questions/30108869/python-2-maketrans-function-doesnt-work-with-unicode-the-arguments-are-diff

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!