Truncating unicode so it fits a maximum size when encoded for wire transfer

前端 未结 5 1520
借酒劲吻你
借酒劲吻你 2020-12-29 21:57

Given a Unicode string and these requirements:

  • The string be encoded into some byte-sequence format (e.g. UTF-8 or JSON unicode escape)
  • The encoded st
5条回答
  •  無奈伤痛
    2020-12-29 22:23

    def unicode_truncate(s, length, encoding='utf-8'):
    encoded = s.encode(encoding)[:length]
    return encoded.decode(encoding, 'ignore')
    

    Here is an example for unicode string where each character is represented with 2 bytes in UTF-8:

    >>> unicode_truncate(u'абвгд', 5)
    u'\u0430\u0431'
    

提交回复
热议问题