Truncating unicode so it fits a maximum size when encoded for wire transfer

前端 未结 5 1537
借酒劲吻你
借酒劲吻你 2020-12-29 21:57

Given a Unicode string and these requirements:

  • The string be encoded into some byte-sequence format (e.g. UTF-8 or JSON unicode escape)
  • The encoded st
5条回答
  •  一个人的身影
    2020-12-29 22:13

    Check the last character of the string. If high bit set, then it is not the last byte in a UTF-8 character, so back up and try again until you find one that is.

    mxlen=255        
    while( toolong.encode("utf8")[mxlen-1] & 0xc0 == 0xc0 ):
        mxlen -= 1
    
    truncated_string = toolong.encode("utf8")[0:mxlen].decode("utf8")
    

提交回复
热议问题