Python : Get size of string in bytes

后端 未结 2 646
星月不相逢
星月不相逢 2020-12-13 01:53

I have a string that is to be sent over a network. I need to check the total bytes it is represented in.

sys.getsizeof(string_name) returns extra bytes.

相关标签:
2条回答
  • 2020-12-13 01:55

    If you want the number of bytes in a string, this function should do it for you pretty solidly.

    def utf8len(s):
        return len(s.encode('utf-8'))
    

    The reason you got weird numbers is because encapsulated in a string is a bunch of other information due to the fact that strings are actual objects in python.

    Its interesting because if you look at my solution to encode the string into 'utf-8', there's an 'encode' method on the 's' object (which is a string). Well, it needs to be stored somewhere right? Hence, the higher than normal byte count. Its including that method, along with a few others :).

    0 讨论(0)
  • 2020-12-13 02:18

    There's a caveat to the accepted answer.

    For some multi-byte encodings (e.g. utf-16), string.encode will add a Byte Order Mark (BOM) at the start, which is a sequence of special bytes that inform the reader on the byte endianness used. So the length you get is actually len(BOM) + len(encoded_word).

    If you don't want to count the BOM bytes, you can use either the little-endian version of the encoding (adding the suffix "-le") or the big-endian version (adding the suffix "be").

    >>> len('ciao'.encode('utf-16'))
    10
    >>> len('ciao'.encode('utf-16-le'))
    8
    
    0 讨论(0)
提交回复
热议问题